View AVR32UC_4146058.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

Feature Summary
* * * * * * * * * * *
Small area, high clock frequency. 32-bit load/store AVR32A RISC architecture. 15 general-purpose 32-bit registers. 32-bit Stack Pointer, Program Counter and Link Register reside in register file. Fully orthogonal instruction set. Pipelined architecture allows one instruction per clock cycle for most instructions. Byte, half-word, word and double word memory access. Fast interrupts and multiple interrupt priority levels. Privileged and unprivileged modes enabling efficient and secure Operating Systems. Optional MPU allows for operating systems with memory protection. Innovative instruction set together with variable instruction length ensuring industry leading code density. * DSP extention with saturating arithmetic, and a wide variety of multiply instructions. * Memory Read-Modify-Write instructions. * Optional advanced On-Chip Debug system.
AVR32UC Technical Reference Manual
32002C-AVR32-11/07
AVR32
1. Introduction
AVR32 is a new high-performance 32-bit RISC microprocessor core, designed for cost-sensitive embedded applications, with particular emphasis on low power consumption and high code density. In addition, the instruction set architecture has been tuned to allow for a variety of microarchitectures, enabling the AVR32 to be implemented as low-, mid- or high-performance processors. AVR32 extends the AVR family into the world of 32- and 64-bit applications.
1.1
The AVR family
The AVR family was launched by Atmel in 1996 and has had remarkable success in the 8-and 16-bit flash microcontroller market. AVR32 is complements the current AVR microcontrollers. Through the AVR32 family, the AVR is extended into a new range of higher performance applications that is currently served by 32- and 64-bit processors To truly exploit the power of a 32-bit architecture, the new AVR32 architecture is not binary compatible with earlier AVR architectures. In order to achieve high code density, the instruction format is flexible providing both compact instructions with 16 bits length and extended 32-bit instructions. While the instruction length is only 16 bits for most instructions, powerful 32-bit instructions are implemented to further increase performance. Compact and extended instructions can be freely mixed in the instruction stream.
1.2
The AVR32 Microprocessor Architecture
The AVR32 is a new innovative microprocessor architecture. It is a fully synchronous synthesisable RTL design with industry standard interfaces, ensuring easy integration into SoC designs with legacy intellectual property (IP). Through a quantitative approach, a large set of industry recognized benchmarks has been compiled and analyzed to achieve the best code density in its class of microprocessor architectures. In addition to lowering the memory requirements, a compact code size also contributes to the core's low power characteristics. The processor supports byte and half-word data types without penalty in code size and performance. Memory load and store operations are provided for byte, half-word, word and double word data with automatic sign- or zero extension of half-word and byte data. The C-compiler is closely linked to the architecture and is able to exploit code optimization features, both for size and speed. In order to reduce code size to a minimum, some instructions have multiple addressing modes. As an example, instructions with immediates often have a compact format with a smaller immediate, and an extended format with a larger immediate. In this way, the compiler is able to use the format giving the smallest code size. Another feature of the instruction set is that frequently used instructions, like add, have a compact format with two operands as well as an extended format with three operands. The larger format increases performance, allowing an addition and a data move in the same instruction in a single cycle. Load and store instructions have several different formats in order to reduce code size and speed up execution: * Load/store to an address specified by a pointer register * Load/store to an address specified by a pointer register with postincrement * Load/store to an address specified by a pointer register with predecrement * Load/store to an address specified by a pointer register with displacement
2
32002C-AVR32-11/07
AVR32
* Load/store to an address specified by a small immediate (direct addressing within a small page) * Load/store to an address specified by a pointer register and an index register. The register file is organized as 16 32-bit registers and includes the Program Counter, the Link Register, and the Stack Pointer. In addition, one register is designed to hold return values from function calls and is used implicitly by some instructions. The AVR32 core defines several micro architectures in order to capture the entire range of applications. The microarchitectures are named AVR32A, AVR32B and so on. Different microarchitectures are suited to different end applications, allowing the designer to select a microarchitecture with the optimum set of parameters for a specific application.
1.3
Exceptions and Interrupts
The AVR32 incorporates a powerful exception handling scheme. The different exception sources, like Illegal Op-code and external interrupt requests, have different priority levels, ensuring a well-defined behavior when multiple exceptions are received simultaneously. Additionally, pending exceptions of a higher priority class may preempt handling of ongoing exceptions of a lower priority class. Each priority class has dedicated registers to keep the return address and status register thereby removing the need to perform time-consuming memory operations to save this information. There are four levels of external interrupt requests, all executing in their own context. An interrupt controller does the priority handling of the external interrupts and provides the prioritized interrupt vector to the processor core.
1.4
Java Support
Some AVR32 implementations provide Java hardware acceleration. To reduce gate count, AVR32UC does not implement any such hardware.
1.5
Microarchitectures
The AVR32 architecture defines different microarchitectures, AVR32A and AVR32B. This enables implementations that are tailored to specific needs and applications. The microarchitectures provide different performance levels at the expense of area and power consumption. The AVR32A microarchitecture is targeted at cost-sensitive, lower-end applications like smaller microcontrollers. This microarchitecture does not provide dedicated hardware registers for shadowing of register file registers in interrupt contexts. Additionally, it does not provide hardware registers for the return address registers and return status registers. Instead, all this information is stored on the system stack. This saves chip area at the expense of slower interrupt handling. Upon interrupt initiation, registers R8-R12 are automatically pushed to the system stack. These registers are pushed regardless of the priority level of the pending interrupt. The return address and status register are also automatically pushed to stack. The interrupt handler can therefore use R8-R12 freely. Upon interrupt completion, the old R8-R12 registers and status register are restored, and execution continues at the return address stored popped from stack. The stack is also used to store the status register and return address for exceptions and scall. Executing the rete or rets instruction at the completion of an exception or system call will pop this status register and continue execution at the popped return address.
3
32002C-AVR32-11/07
AVR32
1.6 The AVR32UC architecture
The first implementation of the AVR32A architecture is called AVR32UC. This implementation targets low- and medium-performance applications, and provides an optional, advanced OCD system, no data or instruction caches, and an optional Memory Protection Unit (MPU). Java acceleration is not implemented. AVR32UC provides three memory interfaces, one High Speed Bus (HSB) master for instruction fetch, one HSB bus master for data access, and one HSB slave interface allowing other bus masters to access data RAMs internal to the CPU. Keeping data RAMs internal to the CPU allows fast access to the RAMs, reduces latency and guarantees deterministic timing. Also, power consumption is reduced by not needing a full HSB bus access for memory accesses. A dedicated data RAM interface is provided for communicating with the internal data RAMs. If an optional MPU is present, all memory accesses are checked for privilege violations. If an access is attempted to an illegal memory address, the access is aborted and an exception is taken. The following figure displays the contents of AVR32UC: Figure 1-1.
Interrupt controller interface
Overview of AVR32UC.
Reset interface OCD interface
OCD system
Power/ Reset control
AVR32UC CPU pipeline
MPU
Instruction memory controller High Speed Bus master
High Speed Bus
Data memory controller High Speed Bus slave
High Speed Bus
High Speed Bus master
High Speed Bus
CPU Local Bus
Data RAM interface
CPU Local Bus master
4
32002C-AVR32-11/07
AVR32
1.7 AVR32UC CPU revisions
Two revisions of the AVR32UC CPU currently exist: * Revision 1 implementing revision 1 of the AVR32 architecture. * Revision 2 implementing revision 2 of the AVR32 architecture, and with a faster divider.
Revision 2 of the AVR32UC CPU added the following instructions: * movh Rd, imm * {add, sub, and, or, eor}{cond4}, Rd, Rx, Ry * ld.{sb, ub, sh, uh, w}{cond4} Rd, Rp[disp] * st.{b, h, w}{cond4} Rp[disp], Rs * rsub{cond4} Rd, imm
AVR32UC CPU revision 2 is fully backward-compatible with revision 1, ie. code compiled for revision 1 is binary-compatible with revision 2 CPUs. The Architecture Revision field in the CONFIG0 system register identifies which architecture revision is implemented in a specific device. The "Processor and Architecture"-chapter of the device datasheet identifies the CPU revision used.
5
32002C-AVR32-11/07
AVR32
2. Programming Model
This chapter describes the programming model and the set of registers accessible to the user. It also describes the implementation options in AVR32UC.
2.1
Architectural compatibility
AVR32UC is fully compatible with the Atmel AVR32A architecture.
2.2
2.2.1
Implementation options
Memory protection AVR32UC optionally supports an MPU as specified by the AVR32 architecture. The chip integrator selects whether or not to implement the MPU. Java support AVR32UC does not implement Java hardware acceleration.
2.2.2
2.3
Register file configuration
The AVR32A architecture dictates a specific register file implementation, reproduced below. Figure 2-1.
Application
Bit 31 Bit 0
Register File in AVR32A
Supervisor
Bit 31 Bit 0
INT0
Bit 31 Bit 0
INT1
Bit 31 Bit 0
INT2
Bit 31 Bit 0
INT3
Bit 31 Bit 0
Exception
Bit 31 Bit 0
NMI
Bit 31 Bit 0
PC LR SP_APP R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
PC LR SP_SYS R12 R11 R10 R9 R8 INT0PC R7 INT1PC R6 FINTPC R5 SMPC R4 R3 R2 R1 R0 SR
2.4
The Status Register
The Status Register (SR) consists of two halfwords, one upper and one lower, see Figure 2-2 on page 7 and Figure 2-3 on page 7. The lower halfword contains the C, Z, N, V and Q flags, as well as the L and T bits, while the upper halfword contains information about the mode and state the processor executes in. The upper halfword can only be accessed from a privileged mode.
6
32002C-AVR32-11/07
AVR32
Figure 2-2.
Bit 31
The Status Register high halfword
Bit 16
-
LC 1 0
-
-
DM
D
-
M2
M1
M0
EM
I3M
I2M FE
I1M
I0M
GM
Bit name Initial value Global Interrupt Mask Interrupt Level 0 Mask Interrupt Level 1 Mask Interrupt Level 2 Mask Interrupt Level 3 Mask Exception Mask Mode Bit 0 Mode Bit 1 Mode Bit 2 Reserved Debug State Debug State Mask Reserved
0
0
0
0
0
0
0
0
1
1
0
0
0
0
1
Figure 2-3.
Bit 15
The Status Register low halfword
Bit 0
0
T 0
0
0
0
0
0
0
0
0
L 0
Q 0
V 0
N 0
Z 0
C 0
Bit name Initial value Carry Zero Sign Overflow Saturation Lock Reserved Scratch Reserved
DM - Debug State Mask If this bit is set, the Debug State is masked and cannot be entered. The bit is cleared at reset, and can both be read and written by software. D - Debug State The processor is in debug state when this bit is set. The bit is cleared at reset and should only be modified by debug hardware, the breakpoint instruction or the retd instruction. Undefined behaviour may result if the user tries to modify this bit using other mechanisms. M2, M1, M0 - Execution Mode These bits show the active execution mode. The settings for the different modes are shown in Table 2-1 on page 8. M2 and M1 are cleared by reset while M0 is set so that the processor is in supervisor mode after reset. These bits are modified by hardware when initiating interrupt or exception processing. Execution of the scall, rets or rete instructions will also change these bits. Undefined behaviour may result if the user tries to modify these bits using the mtsr, ssrf or csrf instructions. If software needs to change these bits, scall, rets or rete should be used, possibly with prior modifications of the stack, to achieve the desired changes in a safe way. Refer to the AVR32 Architecture Manual for the behaviour of these instructions, note especially how the stack is modified after their execution.
7
32002C-AVR32-11/07
AVR32
Table 2-1.
M2 1 1 1 1 0 0 0 0
Mode bit settings
M1 1 1 0 0 1 1 0 0 M0 1 0 1 0 1 0 1 0 Mode Non Maskable Interrupt Exception Interrupt level 3 Interrupt level 2 Interrupt level 1 Interrupt level 0 Supervisor Application
EM - Exception mask When this bit is set, exceptions are masked. Exceptions are enabled otherwise. The bit is automatically set when exception processing is initiated or Debug Mode is entered. Software may clear this bit after performing the necessary measures if nested exceptions should be supported. This bit is set at reset. I3M - Interrupt level 3 mask When this bit is set, level 3 interrupts are masked. If I3M and GM are cleared, INT3 interrupts are enabled. The bit is automatically set when INT3 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT3s should be supported. This bit is cleared at reset. I2M - Interrupt level 2 mask When this bit is set, level 2 interrupts are masked. If I2M and GM are cleared, INT2 interrupts are enabled. The bit is automatically set when INT3 or INT2 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT2s should be supported. This bit is cleared at reset. I1M - Interrupt level 1 mask When this bit is set, level 1 interrupts are masked. If I1M and GM are cleared, INT1 interrupts are enabled. The bit is automatically set when INT3, INT2 or INT1 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT1s should be supported. This bit is cleared at reset. I0M - Interrupt level 0 mask When this bit is set, level 0 interrupts are masked. If I0M and GM are cleared, INT0 interrupts are enabled. The bit is automatically set when INT3, INT2, INT1 or INT0 processing is initiated. Software may clear this bit after performing the necessary measures if nested INT0s should be supported. This bit is cleared at reset. GM - Global Interrupt Mask When this bit is set, all interrupts are disabled. This bit overrides I0M, I1M, I2M and I3M. The bit is automatically set when exception processing is initiated, Debug Mode is entered, or a Java trap is taken. This bit is automatically cleared when returning from a Java trap. This bit is set after reset. 8
32002C-AVR32-11/07
AVR32
T - Scratch bit This bit is not set or cleared implicit by any instruction and the programmer can therefore use this bit as a custom flag to for example signal events in the program. This bit is cleared at reset. L - Lock flag Used by the conditional store instruction. Used to support atomical memory access. Automatically cleared by rete. This bit is cleared after reset. Q - Saturation flag The saturation flag indicates that a saturating arithmetic operation overflowed. The flag is sticky and once set it has to be manually cleared by a csrf instruction after the desired action has been taken. See the Instruction set description for details. V - Overflow flag The overflow flag indicates that an arithmetic operation overflowed. See the Instruction set description for details. N - Negative flag The negative flag is modified by arithmetical and logical operations. See the Instruction set description for details. Z - Zero flag The zero flag indicates a zero result after an arithmetic or logic operation. See the Instruction set description for details. C - Carry flag The carry flag indicates a carry after an arithmetic or logic operation. See the Instruction set description for details.
2.5
System registers
The system registers are placed outside of the virtual memory space, and are only accessible using the privileged mfsr and mtsr instructions. Some of the System Registers can be altered automatically by hardware. The table below lists the system registers specified in AVR32UC. The programmer is responsible for maintaining correct sequencing of any instructions following a mtsr instruction. Table 2-2.
Reg # 0 1 2 3 4 5 6 7
System Registers
Address 0 4 8 12 16 20 24 28 Name SR EVBA ACBA CPUCR ECR RSR_SUP RSR_INT0 RSR_INT1 Function Status Register Exception Vector Base Address Application Call Base Address CPU Control Register Exception Cause Register Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC
9
32002C-AVR32-11/07
AVR32
Table 2-2.
Reg # 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33-63 64 65 66 67 68 69 70 71 72 73
System Registers (Continued)
Address 32 36 40 44 48 52 56 60 64 68 72 76 80 84 88 92 96 100 104 108 112 116 120 124 128 132-252 256 260 264 268 272 276 280 284 288 292 Name RSR_INT2 RSR_INT3 RSR_EX RSR_NMI RSR_DBG RAR_SUP RAR_INT0 RAR_INT1 RAR_INT2 RAR_INT3 RAR_EX RAR_NMI RAR_DBG JECR JOSP JAVA_LV0 JAVA_LV1 JAVA_LV2 JAVA_LV3 JAVA_LV4 JAVA_LV5 JAVA_LV6 JAVA_LV7 JTBA JBCR Reserved CONFIG0 CONFIG1 COUNT COMPARE TLBEHI TLBELO PTBR TLBEAR MMUCR TLBARLO Function Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Return Status Register for Debug Mode Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Return Address Register for Debug Mode Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Reserved for future use Configuration register 0 Configuration register 1 Cycle Counter register Compare register Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC
10
32002C-AVR32-11/07
AVR32
Table 2-2.
Reg # 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103-191 192-255
System Registers (Continued)
Address 296 300 304 308 312 316 320 324 328 332 336 340 344 348 352 356 360 364 368 372 376 380 384 388 392 396 400 404 408 412-764 768-1020 Name TLBARHI PCCNT PCNT0 PCNT1 PCCR BEAR MPUAR0 MPUAR1 MPUAR2 MPUAR3 MPUAR4 MPUAR5 MPUAR6 MPUAR7 MPUPSR0 MPUPSR1 MPUPSR2 MPUPSR3 MPUPSR4 MPUPSR5 MPUPSR6 MPUPSR7 MPUCRA MPUCRB MPUBRA MPUBRB MPUAPRA MPUAPRB MPUCR Reserved IMPL Function Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Bus Error Address Register MPU Address Register region 0 MPU Address Register region 1 MPU Address Register region 2 MPU Address Register region 3 MPU Address Register region 4 MPU Address Register region 5 MPU Address Register region 6 MPU Address Register region 7 MPU Privilege Select Register region 0 MPU Privilege Select Register region 1 MPU Privilege Select Register region 2 MPU Privilege Select Register region 3 MPU Privilege Select Register region 4 MPU Privilege Select Register region 5 MPU Privilege Select Register region 6 MPU Privilege Select Register region 7 MPU Cacheable Register A MPU Cacheable Register B MPU Bufferable Register A MPU Bufferable Register B MPU Access Permission Register A MPU Access Permission Register B MPU Control Register Reserved for future use IMPLEMENTATION DEFINED
SR- Status Register The Status Register is mapped into the system register space. This allows it to be loaded into the register file to be modified, or to be stored to memory. The Status Register is described in detail in Section 2.4 "The Status Register" on page 6.
11
32002C-AVR32-11/07
AVR32
EVBA - Exception Vector Base Address This register contains a pointer to the exception routines. All exception routines start at this address, or at a defined offset relative to the address. Special alignment requirements may apply for EVBA, depending on the implementation of the interrupt controller. Exceptions are described in detail in the AVR32 Architecture Manual. ACBA - Application Call Base Address Pointer to the start of a table of function pointers. Subroutines can thereby be called by the compact acall instruction. This facilitates efficient reuse of code. Keeping this pointer as a register facilitates multiple function pointer tables. ACBA is a full 32 bit register, but the lowest two bits should be written to zero, making ACBA word aligned. Failing to do so may result in erroneous behaviour. CPUCR - CPU Control Register Register controlling the configuration and behaviour of the CPU. The following fields are defined: Table 2-3.
Name LOCEN
CPU control register
Bit Other 16 Reset 0 Description Unused. Read as 0. Should be written as 0. Local Bus Enable. Must be written to 1 to enable the local bus. Any access attempted to the LOCAL section when this bit is cleared will result in a BUS ERROR. Slave Pending Limit. The maximum number of clock cycles the slave interface can have a request pending due to the CPU owning the RAMs. After this period, the CPU will lose arbitrartion for the RAM, and the slave access can proceed. CPU Pending Limit. The maximum number of clock cycles the CPU can have a request pending due to the slave interface owning the RAMs. After this period, the slave interface will lose arbitrartion for the RAM, and the CPU access can proceed. CPU Ownership Period. The number of cycles the CPU is guaranteed to own the RAM after it has won the arbitration for the RAM. No arbitration will be performed during this period. Slave Interface Enable. If this bit is set, the slave interface is enabled. Otherwise, the slave interface is disabled and any slave access will be stalled.
SPL
15:11
16
CPL
10:6
16
COP
5:1
8
SIE
0
1
ECR - Exception Cause Register This register identifies the cause of the most recently executed exception. This information may be used to handle exceptions more efficiently in certain operating systems. The register is updated with a value equal to the EVBA offset of the exception, shifted 2 bit positions to the right. Only the 9 lowest bits of the EVBA offset are considered. As an example, an ITLB miss jumps to EVBA+0x50. The ECR will then be loaded with 0x50>>2 == 0x14. The ECR register is not loaded when an scall, Breakpoint or OCD Stop CPU exception is taken. Note that for interrupts, the offset is given by the autovector provided by the interrupt controller. The resulting ECR value may therefore overlap with an ECR value used by a regular exception. This can be avoided by choosing the autovector offsets so that no such overlaps occur.
12
32002C-AVR32-11/07
AVR32
RSR_DBG - Return Status Register for Debug Mode When Debug mode is entered, the status register contents of the original mode is automatically saved in this register. When the debug routine is finished, the retd instruction copies the contents of RSR_DBG into SR. RAR_DBG - Return Address Register for Debug Mode When Debug mode is entered, the Program Counter contents of the original mode is automatically saved in this register. When the debug routine is finished, the retd instruction copies the contents of RAR_DBG into PC. CONFIG0 / 1 - Configuration Register 0 / 1 Used to describe the processor, its configuration and capabilities. The contents and functionality of these registers is described in detail in Section 2.7 "Configuration Registers" on page 14. COUNT - Cycle Counter Register Can be used as a general counter to time for example execution time. Can also be used together with COMPARE to implement a periodic interrupt for example for an OS timer. The contents and functionality of this register is described in detail in Section 2.6 "COMPARE and COUNT registers" on page 14. COMPARE - Cycle Counter Compare Register Used together with COUNT to implement a periodic interrupt for example for an OS timer. The contents and functionality of this register is described in detail in Section 2.6 "COMPARE and COUNT registers" on page 14. BEAR - Bus Error Address Register Physical address that caused a Data Bus Error. This register is Read Only. Writes are allowed, but are ignored. MPUARn - MPU Address Register n Registers that define the base address and size of the protection regions. Refer to the AVR32 Architecture Manual for details. MPUPSRn - MPU Privilege Select Register n Registers that define which privilege register set to use for the different subregions in each protection region. Refer to the AVR32 Architecture Manual for details. MPUCRA / MPUCRB - MPU Cacheable Register A / B Registers that define if the different protection regions are cacheable. Refer to the AVR32 Architecture Manual for details. MPUBRA / MPUBRB - MPU Bufferable Register A / B Registers that define if the different protection regions are bufferable. Refer to the AVR32 Architecture Manual for details. MPUAPRA / MPUAPRB - MPU Access Permission Register A / B Registers that define the access permissions for the different protection regions. Refer to the AVR32 Architecture Manual for details.
13
32002C-AVR32-11/07
AVR32
MPUCR - MPU Control Register Register that control the operation of the MPU. Refer to the AVR32 Architecture Manual for details.
2.6
COMPARE and COUNT registers
The COUNT register increments once every clock cycle, regardless of pipeline stalls and flushes. The COUNT register can both be read and written. The COUNT register can be used together with the COMPARE register to create a timer with periodic interrupt. The COUNT register is written to zero upon reset and compare match. Incrementation of the COUNT register can not be disabled. The COUNT register will increment even though a compare interrupt is pending. The COMPARE register holds a value that the COUNT register is compared against. The COMPARE register can both be read and written. When the COMPARE and COUNT registers match, a compare interrupt request is generated and COUNT is reset to 0. COUNT will thereafter continue incrementing in the following clock cycle. The interrupt request is routed out to the interrupt controller, which may forward the request back to the processor as a normal interrupt request at a priority level determined by the interrupt controller. Writing a value to the COMPARE register clears any pending compare interrupt requests. The compare and exception generation feature is disabled if the COMPARE register contains the value zero. The COMPARE register is written to zero upon reset. COUNT and COMPARE are clocked by a dedicated clock with the same frequency as the CPU clock. This allows them to operate in some of the sleep modes. They can therefore be used as timers even when the system use sleep modes. Consult the clock system documentation for information on which sleep modes COUNT and COMPARE are operational.
2.7
Configuration Registers
Configuration registers are used to inform applications and operating systems about the setup and configuration of the processor on which it is running, see Figure 2-4 on page 14. AVR32UC implements the following read-only configuration registers. Figure 2-4. Configuration Registers
CONFIG0
31 Processor ID 24 23 20 19 16 15 13 12 10 9 76543210 F J POSDR
Processor Revision
AT
AR
MMUT
CONFIG1
31 IMMU SZ 26 25 DMMU SZ 20 19 ISET 16 15 13 12 10 9 DSET 65 32 0
ILSZ
IASS
DLSZ
DASS
14
32002C-AVR32-11/07
AVR32
Table 2-4 on page 15 shows the CONFIG0 fields. Table 2-4.
Name Processor ID RESERVED Processor revision
CONFIG0 Fields
Bit 31:24 23:20 19:16 Description Specifies the type of processor. This allows the application to distinguish between different processor implementations. Reserved for future use. Specifies the revision of the processor implementation. Architecture type Value Semantic AVR32A Unused in AVR32UC Reserved
AT
15:13
0 1 Other
Architecture Revision Value 0 AR 12:10 1 2 Other MMU type Value 0 MMUT 9:7 1 2 3 Other Semantic None, using direct mapping and no segmentation Unused in AVR32UC Unused in AVR32UC Memory Protection Unit Reserved Revision 1 Revision 2 Reserved Semantic Unused in AVR32UC
Floating-point unit implemented Value F 6 0 1 No FPU implemented Unused in AVR32UC Semantic
Java extension implemented Value J 5 0 1 No Java extension implemented Unused in AVR32UC Semantic
Performance counters implemented Value P 4 0 1 No Performance Counters implemented Unused in AVR32UC Semantic
15
32002C-AVR32-11/07
AVR32
Table 2-4.
Name
CONFIG0 Fields (Continued)
Bit Description On-Chip Debug implemented Value Semantic No OCD implemented OCD implemented
O
3 0 1
SIMD instructions implemented Value S 2 0 1 No SIMD instructions Unused in AVR32UC Semantic
DSP instructions implemented Value D 1 0 1 Unused in AVR32UC DSP instructions implemented Semantic
Memory Read-Modify-Write instructions implemented Value R 0 0 1 Unused in AVR32UC RMW instructions implemented Semantic
Table 2-5 on page 16 shows the CONFIG1 fields. Table 2-5.
Name IMMU SZ DMMU SZ ISET ILSZ IASS DSET DLSZ DASS
CONFIG1 Fields
Bit 31:26 25:20 19:16 15:13 12:10 9:6 5:3 2:0 Description Unused in AVR32UC Specifies the number of MPU entries. Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC Unused in AVR32UC
16
32002C-AVR32-11/07
AVR32
3. Pipeline
3.1 Overview
AVR32UC is a pipelined processor with three pipeline stages: IF, ID and EX. All instructions are issued and complete in order. Some instructions may require several iterations through the EX stage in order to complete. The following figure shows an overview of the AVR32UC pipeline stages. Figure 3-1. The AVR32UC pipeline stages.
MUL
M u ltip ly u n it
IF
ID
R e g file R ead
ALU
R e g file w rite
A L U u n it
P re fe tc h u n it
D e c o d e u n it L o a d -s to re u n it
LS
The follwing abbreviations are used in the figure: * IF - Instruction Fetch * ID - Instruction Decode * EX - Instruction Execute * MUL - Multiplier * ALU - Arithmetic-Logic Unit * LS - Load/Store Unit
3.2
Prefetch unit
The prefetch unit comprises the IF pipestage, and is responsible for feeding instructions to the decode unit. The prefetch unit fetches 32 bits at a time from the instruction memory interface and places them in a FIFO prefetch buffer. At the same time, one instruction, either RISC extended or compact, is fed to the decode stage.
3.3
Decode unit
The decode unit generates the necessary signals in order for the instruction to execute correctly. The ID stage accepts one instruction each clock cycle from the prefetch unit. This instruction is then decoded, and control signals and register file addresses are generated. If the instruction cannot be decoded, an illegal instruction or unimplemented instruction exception is issued. The ID stage also contains a state machine required for controlling multicycle instructions. The ID stage performs the remapping of register file addresses from logical to physical addresses. This is used for remapping the stack pointer register into the SP_APP or SP_SYS registers. 17
32002C-AVR32-11/07
AVR32
3.4 EX pipeline stage
The Execute (EX) pipeline stage performs register file reads, operations on registers and memory, and register file writes. 3.4.1 ALU section The ALU pipeline performs most of the data manipulation instructions, like arithmetical and logical operations. The ALU stage performs the following tasks: * Target address calculation and condition check for change-of-flow instructions. * Condition code checking for conditional instructions. * Address calculation for memory accesses * Writeback address calculation for the LS pipeline. * All flag setting for arithmetical and logical instructions. * The saturation needed by satadd and satsub. * The operation needed by satrnds, satrndu, sats and satu. * Signed and unsigned division 3.4.2 Multiply section All multiply instructions execute in the multiply section. This section implements a 32 by 32 multiplier array, and 16x16, 32x16 and 32x32 multiplications and multiply-accumulates therefore have an issue latency of one cycle. Multiplication of 32 by 32 bits to a 64-bit result require two iterations through the multiplier array, and therefore needs several cycles to complete. This will stall the multiply pipeline until the instruction is complete. A special accumulator cache is implemented in the MUL section. This cache saves the multiplyaccumulate result in dedicated registers in the MUL section, as well as writing them back to the register file. This allows subsequent MAC instructions to read the accumulator value from the cache, instead of from the register file. This will speed up MAC operations by one clock cycle. If a MAC instruction targets a register not found in the cache, one clock cycle is added to the MAC operation, loading the accumulator value from the register file into the cache. In the next cycle, the MAC operation is restarted automatically by hardware. If an instruction, like an add, mul or load, is executed with target address equal to that of a valid cached register, the instruction will update the cache. The accumulator cache can hold one doubleword accumulator value, or one word accumulator value. Hardware ensures that the accumulator cache is kept consistent. If another pipeline section writes to one of the registers kept in the accumulator cache, the cache is updated. The cache is automatically invalidated after reset. 3.4.3 Load-store section The load-store (LS) pipeline is able to read or write one register per clock cycle. The address is calculated by the ALU section. Thereafter the address is passed on to the LS section and output to the memory interface, together with the data to write if the access is a write. If the access is a read, the read data is returned from the memory interface in the same cycle. If the read data requires typecasting or other manipulation like performed by ldins or ldswp, this manipulation is performed in the same cycle. Any load or store multiple registers are decoded by the ID stage and passed on to the EX stage as a series of single load or store word operations.
18
32002C-AVR32-11/07
AVR32
The read-modify-write instructions memc, mems and memt are performed as a non-interruptable sequence of read from and write to memory. The load-store section generates the control signals required to perform this sequence. This sequence takes several clock cycles, so any following instructions requiring the use of the load-store section must stall until the sequence is finished. Following instructions that do not use the load-store section will not have to stall even if the sequence has not finished. Some memory operations to slow memories, such as memories on the HSB bus, may require several clock cycles to perform. If required, the CPU pipeline will stall as long as necessary in order to perform the memory access.
3.5
Support for unaligned addresses
All memory accesses must be performed with the correct alignment according to the data size. The only exception to this is doubleword accesses, which are performed as two word accesses, and therefore can be word-aligned. Any other unaligned memory access will cause an Data Address Exception. Instruction fetches must be halfword aligned. Any other alignment will cause an Instruction Address Exception.
3.6
Forwarding hardware and hazard detection
Since the register file is read and written in the same pipeline stage, no hazards can occur, and no forwarding is necessary. The programmer does not need to take any special considerations regarding data hazards when writing code.
3.7
Event handling
Due to various reasons, the CPU may be required to abort normal program execution in order to handle special, high-priority events. When handling of these events is complete, normal program execution can be resumed. Traditionally, events that are generated internally in the CPU are called exceptions, while events generated by sources external to the CPU are called interrupts. The possible sources of events are listed in Table 3-4 on page 25. The AVR32 has a powerful event handling scheme. The different event sources, like Illegal Opcode and external interrupt requests, have different priority levels, ensuring a well-defined behaviour when multiple events are received simultaneously. Additionally, pending events of a higher priority class may preempt handling of ongoing events of a lower priority class. When an event occurs, the execution of the instruction stream is halted, and execution control is passed to an event handler at an address specified in Table 3-4 on page 25. Most of the handlers are placed sequentially in the code space starting at the address specified by EVBA, with four bytes between each handler. This gives ample space for a jump instruction to be placed there, jumping to the event routine itself. A few critical handlers have larger spacing between them, allowing the entire event routine to be placed directly at the address specified by the EVBA-relative offset generated by hardware. All external interrupt sources have autovectored interrupt service routine (ISR) addresses. This allows the interrupt controller to directly specify the ISR address as an address relative to EVBA. The autovector offset has 14 address bits, giving an offset of maximum 16384 bytes. The target address of the event handler is calculated as (EVBA | event_handler_offset), not (EVBA + event_handler_offset), so EVBA and exception code segments must be set up appropriately.
19
32002C-AVR32-11/07
AVR32
The same mechanisms are used to service all different types of events, including external interrupt requests, yielding a uniform event handling scheme. Each pipeline stage has a pipeline register that holds the exception requests associated with the instruction in that pipeline stage. This allows the exception request to follow the contaminated instruction through the pipeline. Exceptions are detected in two different pipeline stages. The EX stage detects all data-address related exceptions (DTLB Protection and Data Address). All other exceptions, including interrupts, are detected in the ID stage. When an exception is detected in EX, the EX stage and all upstream stages are flushed. Generally, all exceptions, including breakpoint, have the failing instruction as restart address. This allows a fixup exception routine to correct the error and restart the instruction. Interrupts (INT0-3, NMI) have the address of the first non-completed instruction as restart address. 3.7.1 Exceptions and interrupt requests When an event other than scall or debug request is received by the core, the following actions are performed atomically: 1. The pending event will not be accepted if it is masked. The I3M, I2M, I1M, I0M, EM and GM bits in the Status Register are used to mask different events. Not all events can be masked. A few critical events (NMI, Unrecoverable Exception, TLB Multiple Hit and Bus Error) can not be masked. When an event is accepted, hardware automatically sets the mask bits corresponding to all sources with equal or lower priority. This inhibits acceptance of other events of the same or lower priority, except for the critical events listed above. Software may choose to clear some or all of these bits after saving the necessary state if other priority schemes are desired. It is the event source's responsability to ensure that their events are left pending until accepted by the CPU. 2. When a request is accepted, the Status Register and Program Counter of the current context is stored to the system stack. If the event is an INT0, INT1, INT2 or INT3, registers R8-R12 and LR are also automatically stored to stack. Storing the Status Register ensures that the core is returned to the previous execution mode when the current event handling is completed. When exceptions occur, both the EM and GM bits are set, and the application may manually enable nested exceptions if desired by clearing the appropriate bit. Each exception handler has a dedicated handler address, and this address uniquely identifies the exception source. 3. The Mode bits are set to reflect the priority of the accepted event, and the correct register file bank is selected. The address of the event handler, as shown in Table 3-4, is loaded into the Program Counter. The execution of the event handler routine then continues from the effective address calculated. The rete instruction signals the end of the event. When encountered, the Return Status Register and Return Address Register are popped from the system stack and restored to the Status Register and Program Counter. If the rete instruction returns from INT0, INT1, INT2 or INT3, registers R8-R12 and LR are also popped from the system stack. The restored Status Register contains information allowing the core to resume operation in the previous execution mode. This concludes the event handling. 3.7.2 Supervisor calls The AVR32 instruction set provides a supervisor mode call instruction. The scall instruction is designed so that privileged routines can be called from any context. This facilitates sharing of code between different execution modes. The scall mechanism is designed so that a minimal execution cycle overhead is experienced when performing supervisor routine calls from timecritical event handlers. 20
32002C-AVR32-11/07
AVR32
The scall instruction behaves differently depending on which mode it is called from. The behaviour is detailed in the instruction set reference. In order to allow the scall routine to return to the correct context, a return from supervisor call instruction, rets, is implemented. In the AVR32A microarchitecture, scall and rets uses the system stack to store the return address and the status register. 3.7.3 Debug requests The AVR32 architecture defines a dedicated debug mode. When a debug request is received by the core, Debug mode is entered. Entry into Debug mode can be masked by the DM bit in the status register. Upon entry into Debug mode, hardware sets the SR[D] bit and jumps to the Debug Exception handler. By default, debug mode executes in the exception context, but with dedicated Return Address Register and Return Status Register. These dedicated registers remove the need for storing this data to the system stack, thereby improving debuggability. Debug mode is exited by executing the retd instruction. This returns to the previous context.
3.8
3.8.1
Special concerns
System stack Event handling in AVR32UC, like in all AVR32A architectures, uses the system stack pointed to by the system stack pointer, SP_SYS, for pushing and popping R8-R12, LR, status register and return address. Since exception code may be timing-critical, SP_SYS should point to memory addresses in the IRAM section, since the timing of accesses to this memory section is both fast and deterministic. The user must also make sure that the system stack is large enough so that any event is able to push the required registers to stack. If the system stack is full, and an event occurs, the system will enter an UNDEFINED state.
3.8.2
Clearing of pending interrupt requests When an interrupt request is accepted by the CPU, the interrupt handler will eventually be called. The interrupt handler is responsible for performing the required actions so that the requesting module disasserts the interrupt request before the interrupt routine is exited with rete. Failing to do so will cause the interrupt handler to be re-entered after the rete instruction has been executed, since the interrupt request is still active. Different interrupt sources have different ways of disasserting requests, for example reading an interrupt cause register or writing to specific control registers. Refer to the module-specific documentation for information on how to disassert interrupt requests. Disasserting an interrupt request often requires that a bus access is performed to the requesting module. An example of such an access is to read an interrupt cause register. There will be a latency from the execution of the load or store instruction that is to disassert the interrupt request and the actual disassertion of the request. This latency can be caused by the bus system and internal latencies in the interrupting module. It is important that the programmer makes sure that the interrupt request has actually been disasserted before returning from the interrupt with rete. This can usually be ensured by scheduling the code sequence disasserting the interrupt request in such a way that one can be certain that the interrupt request has actually been disasserted before the rete instruction is executed.
21
32002C-AVR32-11/07
AVR32
Code 3-1. Clearing IRQs using code scheduling
// Using scheduling of instructions in the IRQ handler to make sure that the // request has been disasserted before returning from the handler. // Assume that the IRQ is cleared by reading PERIPH_INTCAUSE, r0 points to // this register. irq_handler: ld.w r12, r0[0] // Clear the IRQ rete
The mechanisms and timing required for disasserting an interrupt request from a module is specific to different modules. Usually, the request is disasserted within a few clock cycles after the load or store instruction has been received by the module. In this case, a simple way of making sure that the request has actually been disasserted is to use a data memory barrier ("Data memory barriers" on page 39). The DMB will block the CPU pipeline until the interrupt request has been disasserted. At this point, the rete instruction can safely be executed. Code 3-2. Clearing IRQs using data memory barriers
// Using data memory barriers in the IRQ handler to make sure that the // request has been disasserted before returning from the handler // Assume that the IRQ is cleared by writing a bitmask to PERIPH_INTCLEAR. // r0 points to this register, r1 contains the correct bitmask. irq_handler: st.w r0[0], r1 ld.w r12, r0[0] // data memory barrier rete
The programmer should consult the data sheets for the different peripheral modules to check if special timings or concerns related to disasserting of interrupt requests apply to the specific module. 3.8.3 Masking interrupt requests in peripheral modules Handling an interrupt request involves several operations like pushing of registers to stack and takes several clock cycles. The required operations are controlled by sequencing logic in hardware. This sequencing hardware does not permit that an asserted interrupt request is disasserted while it is in the process of handling this interrupt request. Hardware makes sure that manipulation of the GM and IxM bits in SREG can be performed safely at all times using the mtsr, csrf and ssrf instructions. The programmer does not need to take any special concerns when issuing one of these instructions. All hardware connected to the CPU is implemented in such a way that once an interrupt request is asserted by the hardware, it can only be disasserted by explicit actions by the programmer. Many peripheral modules that are able to assert interrupt requests have control registers or other means of masking one or more of its interrupt requests. For example, a USART can contain an interrupt mask register with individual bits for masking "TX ready" and "RX ready"
22
32002C-AVR32-11/07
AVR32
interrupts. Writing to such a mask register may cause a pending interrupt request from that module to be disasserted. The programmer must at all times make sure that an action that will disassert interrupts at the interrupt source is not performed if it is possible that the interrupt sequencing hardware is in the processing of handling the interrupt request that will be disasserted by the action. It is safe to perform such an action if one of the following is true: * The SREG GM or IxM bit corresponding to the priority of the interrupt request to be masked is set before the action is performed. * It can be guaranteed that the interrupt request being masked by the action is disasserted when the action is initiated and being performed. Code 3-3. Masking IRQs in a peripheral module which may assert an IRQ at any time
// Masking TX_READY IRQ in a peripheral by setting the TXMASK bit in the // IRQMASK register of the peripheral. // Could alternatively mask the SREG IxM bit associated with the IRQ source disassert_periph_tx_irq: ssrf AVR32_SREG_GM mems PERIPH_IRQMASK, PERIPH_TXMASK csrf AVR32_SREG_GM
If the interrupt request is disasserted during the critical clock cycles where the sequencing hardware is active handling this interrupt request, the CPU may enter an UNPREDICTABLE state.
3.9
Entry points for events
Several different event handler entry points exists. In AVR32UC, the reset address is 0x8000_0000. This places the reset address in the boot flash memory area. TLB miss exceptions and scall have a dedicated space relative to EVBA where their event handler can be placed. This speeds up execution by removing the need for a jump instruction placed at the program address jumped to by the event hardware. All other exceptions have a dedicated event routine entry point located relative to EVBA. The handler routine address identifies the exception source directly. AVR32UC uses the ITLB and DTLB protection exceptions to signal a MPU protection violation. ITLB and DTLB miss exceptions are used to signal that an access address did not map to any of the entries in the MPU. TLB multiple hit exception indicates that an access address did map to multiple TLB entries, signalling an error. All external interrupt requests have entry points located at an offset relative to EVBA. This autovector offset is specified by an external Interrupt Controller. The programmer must make sure that none of the autovector offsets interfere with the placement of other code. The autovector offset has 14 address bits, giving an offset of maximum 16384 bytes. Special considerations should be made when loading EVBA with a pointer. Due to security considerations, the event handlers should be located in non-writeable flash memory, or optionally in a privileged memory protection region if an MPU is present. If several events occur on the same instruction, they are handled in a prioritized way. The priority ordering is presented in Table 3-4. If events occur on several instructions at different locations in the pipeline, the events on the oldest instruction are always handled before any events on any
23
32002C-AVR32-11/07
AVR32
younger instruction, even if the younger instruction has events of higher priority than the oldest instruction. An instruction B is younger than an instruction A if it was sent down the pipeline later than A. The addresses and priority of simultaneous events are shown in Table 3-4 on page 25. Some of the exceptions are unused in AVR32UC since it has no MMU, coprocessor interface or floatingpoint unit. The interrupt system requires that an interrupt controller is present outside the core in order to prioritize requests and generate a correct offset if more than one interrupt source exists for each priority level. An interrupt controller generating different offsets depending on interrupt request source is referred to as autovectoring. Note that the interrupt controller should generate autovector addresses that do not conflict with addresses in use by other events or regular program code. The addresses of the interrupt routines are calculated by adding the address on the autovector offset bus to the value of the Exception Vector Base Address (EVBA). The INT0, INT1, INT2, INT3, and NMI signals indicate the priority of the pending interrupt. INT0 has the lowest priority, and NMI the highest priority of the interrupts.
24
32002C-AVR32-11/07
AVR32
Table 3-4.
Priority 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Priority and handler addresses for events
Handler Address 0x8000_0000 Provided by OCD system EVBA+0x00 EVBA+0x04 EVBA+0x08 EVBA+0x0C EVBA+0x10 Autovectored Autovectored Autovectored Autovectored EVBA+0x14 EVBA+0x50 EVBA+0x18 EVBA+0x1C EVBA+0x20 EVBA+0x24 EVBA+0x28 EVBA+0x2C EVBA+0x30 EVBA+0x100 EVBA+0x34 EVBA+0x38 EVBA+0x60 EVBA+0x70 EVBA+0x3C EVBA+0x40 EVBA+0x44 Name Reset OCD Stop CPU Unrecoverable exception TLB multiple hit Bus error data fetch Bus error instruction fetch NMI Interrupt 3 request Interrupt 2 request Interrupt 1 request Interrupt 0 request Instruction Address ITLB Miss ITLB Protection Breakpoint Illegal Opcode Unimplemented instruction Privilege violation Floating-point Coprocessor absent Supervisor call Data Address (Read) Data Address (Write) DTLB Miss (Read) DTLB Miss (Write) DTLB Protection (Read) DTLB Protection (Write) DTLB Modified Event source External input OCD system Internal MPU Data bus Data bus External input External input External input External input External input CPU MPU MPU OCD system Instruction Instruction Instruction UNUSED UNUSED Instruction CPU CPU MPU MPU MPU MPU UNUSED PC(Supervisor Call) +2 PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction PC of offending instruction Stored Return Address Undefined First non-completed instruction PC of offending instruction PC of offending instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction First non-completed instruction PC of offending instruction PC of offending instruction PC of offending instruction First non-completed instruction PC of offending instruction PC of offending instruction PC of offending instruction
3.9.1 3.9.1.1
Description of events Reset Exception The Reset exception is generated when the reset input line to the CPU is asserted. The Reset exception can not be masked by any bit. The Reset exception resets all synchronous elements and registers in the CPU pipeline to their default value, and starts execution of instructions at address 0x8000_0000.
SR = reset_value_of_SREG;
25
32002C-AVR32-11/07
AVR32
PC = 0x8000_0000;
All other system registers are reset to their reset value, which may or may not be defined. Refer to the Programming Model chapter for details. 3.9.1.2 OCD Stop CPU Exception The OCD Stop CPU exception is generated when the OCD Stop CPU input line to the CPU is asserted. The OCD Stop CPU exception can not be masked by any bit. This exception is identical to a non-maskable, high priority breakpoint. Any subsequent operation is controlled by the OCD hardware. The OCD hardware will take control over the CPU and start to feed instructions directly into the pipeline.
RSR_DBG = SR; RAR_DBG = PC; SR[M2:M0] = B'110; SR[D] = 1; SR[DM] = 1; SR[EM] = 1; SR[GM] = 1;
3.9.1.3
Unrecoverable Exception The Unrecoverable Exception is generated when an exception request is issued when the Exception Mask (EM) bit in the status register is asserted. The Unrecoverable Exception can not be masked by any bit. The Unrecoverable Exception is generated when a condition has occurred that the hardware cannot handle. The system will in most cases have to be restarted if this condition occurs.
*(--SPSYS) = PC of offending instruction; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x00;
3.9.1.4
TLB Multiple Hit Exception The TLB Multiple Hit Exception is generated when an access hits in multiple MPU regions. This is usually caused by programming error. Used only if an MPU is present.
*(--SPSYS) = PC of offending instruction; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x04;
3.9.1.5
Bus Error Exception on Data Access The Bus Error on Data Access exception is generated when the data bus detects an error condition. This exception is caused by events unrelated to the instruction stream, or by data written to the cache write-buffers many cycles ago. Therefore, execution can not be resumed in a safe way after this exception. The return address placed on stack is unrelated to the operation that
26
32002C-AVR32-11/07
AVR32
caused the exception. The exception handler is responsible for performing the appropriate action.
*(--SPSYS) = PC of first non-issued instruction; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x08; BEAR = failing address
3.9.1.6
Bus Error Exception on Instruction Fetch The Bus Error on Instruction Fetch exception is generated when the data bus detects an error condition. This exception is caused by events related to the instruction stream. Therefore, execution can be restarted in a safe way after this exception, assuming that the condition that caused the bus error is dealt with.
*(--SPSYS) = PC of first non-issued instruction; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x0C;
3.9.1.7
NMI Exception The NMI exception is generated when the NMI input line to the core is asserted. The NMI exception can not be masked by the SR[GM] bit. However, the core ignores the NMI input line when processing an NMI Exception (the SR[M2:M0] bits are B'111). This guarantees serial execution of NMI Exceptions, and simplifies the NMI hardware and software mechanisms. Since the NMI exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the NMI exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream.
*(--SPSYS) = PC of first noncompleted instruction; *(--SPSYS) = SR; SR[M2:M0] = B'111; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x10;
3.9.1.8
INT3 Exception The INT3 exception is generated when the INT3 input line to the core is asserted. The INT3 exception can be masked by the SR[GM] bit, and the SR[I3M] bit. Hardware automatically sets the SR[I3M] bit when accepting an INT3 exception, inhibiting new INT3 requests when processing an INT3 request. The INT3 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset.
27
32002C-AVR32-11/07
AVR32
Since the INT3 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT3 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream.
*(--SPSYS) = R8; *(--SPSYS) = R9; *(--SPSYS) = R10; *(--SPSYS) = R11; *(--SPSYS) = R12; *(--SPSYS) = LR; *(--SPSYS) = PC of first noncompleted instruction; *(--SPSYS) = SR; SR[M2:M0] = B'101; SR[I3M] = 1; SR[I2M] = 1; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET;
3.9.1.9
INT2 Exception The INT2 exception is generated when the INT2 input line to the core is asserted. The INT2 exception can be masked by the SR[GM] bit, and the SR[I2M] bit. Hardware automatically sets the SR[I2M] bit when accepting an INT2 exception, inhibiting new INT2 requests when processing an INT2 request. The INT2 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT2 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT2 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream.
*(--SPSYS) = R8; *(--SPSYS) = R9; *(--SPSYS) = R10; *(--SPSYS) = R11; *(--SPSYS) = R12; *(--SPSYS) = LR; *(--SPSYS) = PC of first noncompleted instruction; *(--SPSYS) = SR; SR[M2:M0] = B'100; SR[I2M] = 1; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET;
3.9.1.10
INT1 Exception The INT1 exception is generated when the INT1 input line to the core is asserted. The INT1 exception can be masked by the SR[GM] bit, and the SR[I1M] bit. Hardware automatically sets
28
32002C-AVR32-11/07
AVR32
the SR[I1M] bit when accepting an INT1 exception, inhibiting new INT1 requests when processing an INT1 request. The INT1 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT1 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT1 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream.
*(--SPSYS) = R8; *(--SPSYS) = R9; *(--SPSYS) = R10; *(--SPSYS) = R11; *(--SPSYS) = R12; *(--SPSYS) = LR; *(--SPSYS) = PC of first noncompleted instruction; *(--SPSYS) = SR; SR[M2:M0] = B'011; SR[I1M] = 1; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET;
3.9.1.11
INT0 Exception The INT0 exception is generated when the INT0 input line to the core is asserted. The INT0 exception can be masked by the SR[GM] bit, and the SR[I0M] bit. Hardware automatically sets the SR[I0M] bit when accepting an INT0 exception, inhibiting new INT0 requests when processing an INT0 request. The INT0 Exception handler address is calculated by adding EVBA to an interrupt vector offset specified by an interrupt controller outside the core. The interrupt controller is responsible for providing the correct offset. Since the INT0 exception is unrelated to the instruction stream, the instructions in the pipeline are allowed to complete. After finishing the INT0 exception routine, execution should continue at the instruction following the last completed instruction in the instruction stream.
*(--SPSYS) = R8; *(--SPSYS) = R9; *(--SPSYS) = R10; *(--SPSYS) = R11; *(--SPSYS) = R12; *(--SPSYS) = LR; *(--SPSYS) = PC of first noncompleted instruction; *(--SPSYS) = SR; SR[M2:M0] = B'010; SR[I0M] = 1; PC = EVBA | INTERRUPT_VECTOR_OFFSET;
29
32002C-AVR32-11/07
AVR32
3.9.1.12 Instruction Address Exception The Instruction Address Error exception is generated if the generated instruction memory address has an illegal alignment.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x14;
3.9.1.13
ITLB Miss Exception The ITLB Miss exception is generated when the MPU is enabled and the instruction memory access does not hit in any regions. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x50;
3.9.1.14
ITLB Protection Exception The ITLB Protection exception is generated when the instruction memory access violates the access rights specified by the protection region in which the address lies. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x18;
3.9.1.15
Breakpoint Exception The Breakpoint exception is issued when the OCD breakpoint input line to the CPU is aseerted, and SREG[DM] is cleared. When entering the exception routine, RAR_DBG points to the breakpoint instruction, and the CPU will enter Debug mode. An external debugger can optionally assume control of the CPU when the Breakpoint Exception is executed. The debugger can then issue individual instructions to be executed in Debug mode. Debug mode is exited with the retd instruction. This passes control from the debugger back to the CPU, resuming normal execution.
RSR_DBG = SR; RAR_DBG = PC; SR[M2:M0] = B'110; SR[D] = 1; SR[DM] = 1; SR[EM] = 1; SR[GM] = 1;
30
32002C-AVR32-11/07
AVR32
PC = EVBA | 0x1C;
3.9.1.16
Illegal Opcode This exception is issued when the core fetches an unknown instruction, or when a coprocessor instruction is not acknowledged. When entering the exception routine, the return address on stack points to the instruction that caused the exception.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x20;
3.9.1.17
Unimplemented Instruction This exception is issued when the core fetches an instruction supported by the instruction set but not by the current implementation. This allows software implementations of unimplemented instructions. When entering the exception routine, the return address on stack points to the instruction that caused the exception. Table 3-5. List of unimplemented instructions.
Comment No SIMD implemented No coprocessors implemented No cache implemented No Java implemented No Java implemented No Java implemented No Java implemented No MMU present No MMU present No MMU present
Privileged Instructions All SIMD instructions All coprocessor instructions cache - perform cache operation incjosp - increment Java stack pointer popjc - pop Java context pushjc - push Java context retj- return from Java mode tlbr - read addressed TLB entry into TLBEHI and TLBELO tlbw - write TLB entry registers into TLB tlbs - search TLB for entry matching TLBEHI[VPN]
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x24;
3.9.1.18
Data Read Address Exception The Data Read Address Error exception is generated if the address of a data memory read has an illegal alignment. 31
32002C-AVR32-11/07
AVR32
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x34;
3.9.1.19
Data Write Address Exception The Data Write Address Error exception is generated if the address of a data memory write has an illegal alignment.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x38;
3.9.1.20
DTLB Read Miss Exception The DTLB Read Miss exception is generated when the MPU is enabled and the data memory read access does not hit in any regions. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x60;
3.9.1.21
DTLB Write Miss Exception The DTLB Write Miss exception is generated when the MPU is enabled and the data memory write access does not hit in any regions. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x70;
3.9.1.22
DTLB Read Protection Exception The DTLB Protection exception is generated when the data memory read violates the access rights specified by the protection region in which the address lies. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x3C;
32
32002C-AVR32-11/07
AVR32
3.9.1.23 DTLB Write Protection Exception The DTLB Protection exception is generated when the data memory write violates the access rights specified by the protection region in which the address lies. Used only if an MPU is present.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x40;
3.9.1.24
Privilege Violation Exception If the application tries to execute privileged instructions, this exception is issued. The complete list of priveleged instructions is shown in Table 3-6 on page 33. When entering the exception routine, the address of the instruction that caused the exception is stacked as the return address.
*(--SPSYS) = PC; *(--SPSYS) = SR; SR[M2:M0] = B'110; SR[EM] = 1; SR[GM] = 1; PC = EVBA | 0x28;
Table 3-6.
List of instructions which can only execute in privileged modes.
Comment Privileged only when accessing upper half of status register
Privileged Instructions csrf - clear status register flag mtsr - move to system register mfsr - move from system register mtdr - move to debug register mfdr - move from debug register rete- return from exception rets - return from supervisor call retd - return from debug mode sleep - sleep ssrf - set status register flag
Privileged only when accessing upper half of status register
3.9.1.25
DTLB Modified Exception Unused in AVR32UC, since it has no MMU. Floating-point Exception Unused in AVR32UC, since it has no FPU. Coprocessor Exception Unused in AVR32UC, since it has no coprocessor interface.
3.9.1.26
3.9.1.27
33
32002C-AVR32-11/07
AVR32
3.9.1.28 Supervisor call Supervisor calls are signalled by the application code executing a supervisor call (scall) instruction. The scall instruction behaves differently depending on which context it is called from. This allows scall to be called from other contexts than Application. When the exception routine is finished, execution continues at the instruction following scall. The rets instruction is used to return from supervisor calls.
If ( SR[M2:M0] == {B'000 or B'001} ) *(--SPSYS) = PC; *(--SPSYS) = SR; PC EVBA | 0x100; SR[M2:M0] B'001; else LR PC + 2; PC EVBA | 0x100;
3.10
Interrupt latencies
The following features in AVR32UC ensure low and deterministic interrupt latency: * Four different interrupt levels and an NMI ensures that the user can efficiently prioritize the interrupt sources. * Long-running instructions such as ldm, stm, pushm, popm, divs and divu will be aborted if an interrupt request is received. The slowest instruction that can not be aborted by a pending interrupt has a worst case issue latency of 5 cycles. This implies that an interrupt request will need to wait at most 5 cycles for an instruction to complete. The fastest instructions need only a single cycle to complete. * Interrupts are autovectored, allowing the CPU to jump directly to the interrupt handler. * When an interrupt of level m is received, the CPU will start stacking register file registers, return address and status register. After this stacking is performed, the CPU will jump to the autovector address of the interrupt of level m. If an interrupt of level n, where n > m, is received during this stacking, the CPU will jump to the autovector address of the interrupt of level n, NOT the autovector address of the original interrupt.
3.10.1
Maximum interrupt latency The maximum interrupt latency can be calculated as follows: Table 3-7.
Source Wait for the slowest instruction to complete Stack register file registers, return address and status register, and jump to autovector target Wait for autovector target instruction to be fetched TOTAL
Maximum interrupt latency
Delay 5 10 1 16
3.10.2
Minimum interrupt latency The minimum interrupt latency of an interrupt request of level m will occur when the CPU is in the process of stacking the registers and return address associated with an interrupt request of level n, where n < m. If the level m interrupt request arrives just as the CPU is about to jump to 34
32002C-AVR32-11/07
AVR32
the the autovector address for the interrupt of level n, the CPU will jump directly to the autovector address of the latest arriving interrupt. In this case, the minimum interrupt latency is as follows: Table 3-8.
Source Jump to autovector target Wait for autovector target instruction to be fetched TOTAL
Minimum interrupt latency - higher priority interrupt preempts lower priority interrupt
Delay 1 1 2
Assuming that the interrupt request arrives when the CPU is in the process of executing program code, the minimum interrupt latency can be calculated as follows: Table 3-9.
Source Wait for the fastest instruction to complete Stack register file registers, return address and status register, and jump to autovector target Wait for autovector target instruction to be fetched TOTAL
Minimum interrupt latency - interrupt received when executing program code
Delay 1 10 1 12
3.11
NMI latency
Non-maskable interrupts (NMI) behave similarly to interrupts, except that they do not automatically push register file registers on the stack. NMI can, similar to interrupts, abort long-running instructions. The maximum NMI latency can be calculated as follows: Table 3-10.
Source Wait for the slowest instruction to complete Stack return address and status register, and jump to autovector target Wait for autovector target instruction to be fetched TOTAL
Maximum NMI latency
Delay 5 4 1 10
35
32002C-AVR32-11/07
AVR32
4. Memory System
AVR32UC implements a 32-bit unsegmented memory space. Regions of this memory space can be protected by an optional MPU. The memory map is as follows: Figure 4-1. The AVR32UC memory map.
H'FFFFFFFF 1GB High Speed Bus space H'C0000000 HSB
1GB Boot Program Memory H'80000000
BOOT
1GB CPU Local Bus Memory H'40000000
LOCAL
1 GB Internal Data RAM H'00000000
IRAM
4.1
Memory sections
The memory map contains four sections, named IRAM, LOCAL, BOOT and HSB. The IRAM section contains the internal EX stage memory, and this memory is mapped from address 0 and upwards. The LOCAL section is mapped from address 0x4000_0000 and is designed for containing device-specific high-speed interfaces, such as floating-point units, encryption hardware or high-speed GPIO ports. Access to the LOCAL space is performed using any ordinary load and store instructions, and is performed in a single clock cycle. Mapping timing-critical devices in the LOCAL section is beneficial as the interface operates with high clock frequency, and its timing is deterministic since it does not need to access a shared bus which may be heavily loaded. The BOOT section starts at address 0x8000_0000, which is the reset address for AVR32UC. This section will typically contain an internal program FLASH, mapped from address 0x8000_0000 and upwards. The HSB section contains the addresses of all modules mapped on the HSB bus. This may include peripherals such as USARTs and external memory interfaces. The memory space is uniform, so program code can execute from the IRAM, BOOT and HSB sections, and data accesses can be performed to any of the these sections. Note that implementations of AVR32UC of may forbid certain accesses to certain memory sections, eg a write to program FLASH mapped into the BOOT section may be forbidden. The LOCAL section is only accessible by the Load-Store Unit in the CPU EX pipeline stage, therefore, code can not be executed from addresses in the LOCAL space. 36
32002C-AVR32-11/07
AVR32
4.2 Memory interfaces
The AVR32UC CPU has three memory interfaces: * IF stage HSB master interface for instruction fetches * EX stage HSB master interface for data accesses into BOOT or HSB sections * EX stage HSB slave interface enabling other parts of the system to access addresses in the IRAM section
4.3
IF stage interface
The single master interface in the IF stage performs instruction fetches. All fetches are performed with word alignment, except for the first fetch after a change-of-flow, which may use halfword alignment. The IF stage can not perform writes, only reads are possible. Reads can be perfomed from all addresses mapped on the HSB bus. Reads are performed as incrementing bursts of unspecified length. The IF stage master interface will stall appropriately to support slow slaves.
4.4
EX stage interfaces
The EX stage separates between CPU accesses to the IRAM section, and accesses to BOOT/HSB. Any access to the IRAM section are performed to dedicated, high-speed RAMs implemented inside the memory controller. These fast RAMs are able to read or write within the cycle they are initiated. This means that a load instruction in EX will have the read-data ready at the end of the clock cycle for writing into the register file.
4.4.1
EX stage HSB master interface Any CPU access to the BOOT or HSB sections will use multiple clock cycles, as dictated by the HSB semantics. Writes to the BOOT or HSB sections can be pipelined, and are performed as a stream of nonsequential transfers, each taking one cycle unless stalled by the slave. If the slave stalls the transfer, the CPU will stall until the slave releases the stall. CPU reads from the BOOT or HSB sections are not pipelined, and transfer of a data therefore takes two clock cycles, one cycle for the address phase, and one cycle for the data phase. The CPU will be stalled in the data phase. EX stage HSB slave interface The AVR32UC CPU provides a slave interface into the high-speed RAMs that are implemented inside the memory controller. This interface enables other parts of the system, like DMA controllers, to write or read data to or from the RAMs. The slave interface support bursts for both reads and writes. If the high-speed RAMs for some reason cannot accept the transfer request, it will reply by stalling the request until it can be serviced. The arbitration priorities between the CPU and the slave interface for the RAMs can be controlled by programming the CPU Control Register (CPUCR). The CPUCR is described in Section 2.5 on page 9. Arbitration is performed according to the following rules: Assuming the memory interface is idle, and no memory transfers have been performed. Whoever requests access to the RAMs will win the arbitration and get access. If both the CPU and the slave interface requests access, the CPU will win. The source that won the arbitration can use the RAMs for as long as they require. If the other source also has a pending request for use of the RAM, this source will have to wait maximum the number of cycles specified by the SPL or CPL fields of CPUCR. The pending source will gain
4.4.2
37
32002C-AVR32-11/07
AVR32
access to the RAMs when the current owner voluntarily releases the RAMs, or after the SPL/CPL timeout period, whichever comes first. If the CPU wins arbitration for the RAMs, the CPU is guaranteed to own the RAM for the period specified by the COP field in CPUCR. Any slave request will be left pending during this period, even if the CPU is not using the RAMs. The following state diagram shows the states in arbitration for the RAM. Figure 4-2. Arbitration between CPU and slave interface for RAMs.
RAM is free
1
5
2
6
3
CPU owns the RAM
Slave I/F owns the RAM
4
The state transitions are as follows:
1: CPU_wants_to_perform_mem_access 2: CPU_access_complete && (been_in_state > CPUCR[COP]) 3: (been_in_state > CPUCR[COP]) && slave_wants_to_perform_mem_access && (slave_been_pending > CPUCR[SPL]) 4: CPU_wants_to_perform_mem_access && (CPU_been_pending > CPUCR[CPL]) 5: slave_wants_to_perform_mem_access && !CPU_wants_to_perform_mem_access 6: slave_access_complete
4.4.3
EX stage local bus interface Any CPU access to the the LOCAL section is completed in a single clock cycle, both for reads and writes. Transfers on this bus can not be stalled. The CPU will never be stalled due to an access to the LOCAL section. Accesses to this section is performed using regular load-store instructions such as for example ldswp.w, ld.w, ld.ub, st.w, stswp.w, ldm or stm. Which devices are mapped in the LOCAL section, and their memory maps, is device-specific. The LOCAL interface must be enabled by the user by programming the LOCEN bit in CPUCR. Accesses to LOCAL memory addresses without first enabling the section will result in a BUS ERROR exception. If the MPU is enabled, accesses to LOCAL will be subject to permission checking. To ensure maximum transfer speed and cycle determinism, any slaves being addressed by the CPU on the local bus must be able to receive and transmit data on the bus at CPU clock speeds. The consequences of this may vary between different slave devices, but for some slave devices it may imply that the slaves have to run at the CPU clock frequency when local bus transfers are
38
32002C-AVR32-11/07
AVR32
being performed. Refer to the device datasheet for information on any relationships between CPU and device clock frequencies imposed by the local bus.
4.5
IRAM Write buffer
The EX stage has a write buffer used to hold data to be written to the IRAM section. The operation of this buffer is usually transparent to the programmer. The programmer should be aware of the following: * The IRAM has a single port, allowing either one read or one write per clock cycle. * The write buffer is pipelined, allowing sequential writes to IRAM to be pipelined without any pipeline stalls. The previous contents of the write buffer is written to the RAM in parallel with the new store data being placed in the write buffer. * Any read instruction to IRAM in EX will be performed immediately, even if a previous store instruction has placed data to store in the write buffer. In this case, the previous store data remains in the write buffer and will be written back to RAM in a later clock cycle. * If a read instruction in EX accesses the same address as the data in the write buffer is to be stored to, the pipeline is stalled for one clock cycle while the write buffer is emptied to RAM. The read will be performed normally in the next clock cycle. * The contents of the write buffer is written to the physical RAM as soon as the memory interface is not used by any instructions. * The state of the write buffer may affect the timing of RMW instructions, see "Read-modifywrite instructions" on page 58 for details
4.6
Memory barriers
Memory barriers are constructs used to enfore memory consitency. Caches and self-modifying code may cause memory to become inconsistent. AVR32UC has a simple pipeline with no caches, so there is usually no need for memory barriers. Mechanisms for memory barriers are present to handle the cases where such barriers are needed.
4.6.1
Instruction memory barriers An instruction memory barrier (IMB) is usually only needed when executing self-modifying code, for example when self-programming program flash. In this case, one must ensure that all levels in the memory hierarchy are consistent. Due to the simple non-cached memory system in AVR32UC, this is usually trivial. The programmer should make sure that an IMB is used if there is a possibility that an instruction to be modified by self-modifying code has already been prefetched by the instruction prefetch unit. In this case, an IMB should be inserted between the instruction modifying the code and the execution of the modified instruction. To make sure that the modified version of the instruction is executed, the prefetch buffer should be flushed between changing the program memory and executing the new version of the program. Any instruction performing a change-of flow, such as return from exception, conditional branches, unconditional branches, subprogram call or return, or instructions writing to PC would implement an IMB in AVR32UC.
4.6.2
Data memory barriers A data memory barrier (DMB) is used to make sure that a data memory access, either a read or write, is actually performed before the rest of the code is executed. Caches, write buffers and
39
32002C-AVR32-11/07
AVR32
bus latency may cause a memory access to be seen by a slave many cycles after it has been executed by the pipeline. In some cases, this may lead to UNPREDICTABLE behavior in the system. One example of this is found in interrupt handlers. One usually wants to make sure that the interrupt request has been cleared before executing the rete instruction, otherwise the same interrupt may be serviced immediately after executing the rete instruction. In this case a DMB must be inserted between the code clearing the interrupt request and the rete. All accesses to HSB space are strongly ordered. This is used to implement DMBs. A DMB after a store to a HSB slave is implemented by performing a dummy read from the same slave. Any critical code after the read will stall until the read has been performed. Consider an interrupt request made by a peripheral. This peripheral will disassert the interrupt request as soon as the interrupt handler has written a specific bitmask to its PERIPH_INTCLEAR register. A read from the same peripheral performs a bus transfer that implements the DMB. The rete instruction can be executed after the DMB. Code 4-1. Clearing IRQs using data memory barriers
// Using data memory barriers in the IRQ handler to make sure that the // request has been disasserted before returning from the handler // Assume that the IRQ is cleared by writing a bitmask to PERIPH_INTCLEAR. // r0 points to this register, r1 contains the correct bitmask. irq_handler: st.w r0[0], r1 ld.w r12, r0[0] // data memory barrier rete
40
32002C-AVR32-11/07
AVR32
5. Memory Protection Unit
The AVR32 architecture defines an optional Memory Protection Unit (MPU). This is a simpler alternative to a full MMU, while at the same time allowing memory protection. The MPU allows the user to divide the memory space into different protection regions. These protection regions have a user-defined size, and starts at a user-defined address. The different regions can have different cacheability attributes and bufferability attributes. Each region is divided into 16 subregions, each of these subregions can have one of two possible sets of access permissions. The MPU does not perform any address translation.
5.1
Memory map in systems with MPU
An AVR32 implemetation with a MPU has a flat, unsegmented memory space. Access permissions are given only by the different protection regions.
5.2
Understanding the MPU
The AVR32 Memory Protection Unit (MPU) is responsible for checking that memory transfers have the correct permissions to complete. If a memory access with unsatisfactory privileges is attempted, an exception is generated and the access is aborted. If an access to a memory address that does not reside in any protection region is attempted, an exception is generated and the access is aborted. The user is able to allow different privilege levels to different blocks of memory by configuring a set of registers. Each such block is called a protection region. Each region has a user-programmable start address and size. The MPU allows the user to program 8 different protection regions. Each of these regions have 16 sub-regions, which can have different access permissions, cacheability and bufferability. The "DMMU SZ" fields in the CONFIG1 system register identifies the number of implemented protection regions, and therefore also the number of MPU registers. An AVR32UC system with caches also have MPU cacheability and bufferability registers. A protection region can be from 4 KB to 4 GB in size, and the size must be a power of two. All regions must have a start address that is aligned to an address corresponding to the size of the region. If the region has a size of 8 KB, the 13 lowest bits in the start address must be 0. Failing to do so will result in UNDEFINED behaviour. Since each region is divided into 16 sub-regions, each sub-region is 256 B to 256 MB in size. When an access hits into a memory region set up by the MPU, hardware proceeds to determine which subregion the access hits into. This information is used to determine whether the access p e r m i s s io n s f o r t h e s u b r e g i o n a r e g iv e n i n M P U A P R A/ M PU B R A / M P U C R A o r i n MPUAPRB/MPUBRB/MPUCRB. If an access does not hit in any region, the transfer is aborted and an exception is generated. The MPU is enabled by writing setting the E bit in the MPUCR register. The E bit is cleared after reset. If the MPU is disabled, all accesses are treated as uncacheable, unbufferable and will not generate any access violations. Before setting the E bit, at least one valid protection region must be defined.
5.2.1
MPU interface registers The following registers are used to control the MPU, and provide the interface between the MPU and the operating system, see Figure 5-1 on page 42. All the registers are mapped into the Sys-
41
32002C-AVR32-11/07
AVR32
tem Register space, their addresses are presented in "System registers" on page 9. They are accessed with the mtsr and mfsr instructions. The MPU interface registers are shown below. The suffix n can have the range 0-7, indicating which region the register is associated with. Figure 5-1. The MPU interface registers
12 11 Base Address 65 Size 10 V
MPUARn
31
MPUPSRn
31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0 876543210 C7 C6 C5 C4 C3 C2 C1 C0 876543210 B7 B6 B5 B4 B3 B2 B1 B0 16 15 AP4 AP3 12 11 AP2 87 AP1 43 AP0 0 10 E 20 19 AP5
MPUCRA / MPUCRB
31
MPUBRA / MPUBRB
31
MPUAPRA / MPUAPRB
31 AP7 28 27 AP6 24 23
MPUCR
31
5.2.1.1
MPU Address Register - MPUARn A MPU Address register is implemented for each of the 8 protection regions. The MPUAR registers specify the start address and size of the regions. The start address must be aligned so that its alignment corresponds to the size of the region. The minimum allowable size of a region is 4 KB, so only bits 31:12 in the base address needs to be specified. The other bits are always 0. Each MPUAR also has a valid bit that specifies if the protection region is valid. Only valid regions are considered in the protection testing. The MPUAR register consists of the following fields: * Base address - The start address of the region. The minimum size of a region is 4KB, so only the 20 most significant bits in the base address needs to be specified. The 12 lowermost base address bits are implicitly set to 0. If protection regions larger than 4 KB is used, the user must write the appropriate bits in Base address to 0, so that the base address is aligned to the size of the region. Otherwise, the result is UNDEFINED.
42
32002C-AVR32-11/07
AVR32
* Size - Size of the protection region. The possible sizes are shown in Table 5-1 on page 43. Table 5-1.
Size B'00000 to B'01010 B'01011 B'01100 B'01101 B'01110 B'01111 B'10000 B'10001 B'10010 B'10011 B'10100 B'10101 B'10110 B'10111 B'11000 B'11001 B'11010 B'11011 B'11100 B'11101 B'11110 B'11111
Protection region sizes implied by the Size field
Region size UNDEFINED 4 KB 8 KB 16 KB 32 KB 64 KB 128 KB 256 KB 512 KB 1 Mb 2 MB 4 MB 8 MB 16 MB 32 MB 64 MB 128 MB 256 MB 512 MB 1 GB 2 GB 4 GB Constraints on Base address None Bit [12] in Size must be 0 Bit [13:12] in Size must be 0 Bit [14:12] in Size must be 0 Bit [15:12] in Size must be 0 Bit [16:12] in Size must be 0 Bit [17:12] in Size must be 0 Bit [18:12] in Size must be 0 Bit [19:12] in Size must be 0 Bit [20:12] in Size must be 0 Bit [21:12] in Size must be 0 Bit [22:12] in Size must be 0 Bit [23:12] in Size must be 0 Bit [24:12] in Size must be 0 Bit [25:12] in Size must be 0 Bit [26:12] in Size must be 0 Bit [27:12] in Size must be 0 Bit [28:12] in Size must be 0 Bit [29:12] in Size must be 0 Bit [30:12] in Size must be 0 Bit [31:12] in Size must be 0
* V - Valid. Set if the protection region is valid, cleared otherwise. This bit is written to 0 by a reset. The region is not considered in the protection testing if the V bit is cleared. 5.2.1.2 MPU Permission Select Register - MPUPSRn A MPU Permission Select register is implemented for each of the 8 protection regions. Each MPUPSR register divides the protection region into 16 subregions. The bitfields in MPUPSR specifies whether each subregion has access permissions as specified by the region entry in either MPUAPRA or MPUAPRB. Table 5-2.
MPUPSRn[P] 0 1
Subregion access permission implied by MPUPSR bitfields
Access permission MPUAPRA[APn] MPUAPRB[APn]
43
32002C-AVR32-11/07
AVR32
5.2.1.3 MPU Cacheable Register A / B- MPUCRA / MPUCRB The MPUCR registers have one bit per region, indicating if the region is cacheable. If the corresponding bit is set, the region is cacheable. The register is written to 0 upon reset. AVR32UC implementations may optionally choose not to implement the MPUCR registers. 5.2.1.4 MPU Bufferable Register A / B- MPUBRA / MPUBRB The MPUBR registers have one bit per region, indicating if the region is bufferable. If the corresponding bit is set, the region is bufferable. The register is written to 0 upon reset. AVR32UC implementations may optionally choose not to implement the MPUBR registers. 5.2.1.5 MPU Access Permission Register A / B - MPUAPRA / MPUAPRB The MPUAPR registers indicate the access permissions for each region. The MPUAPR is written to 0 upon reset. The possible access permissions are shown in Table 5-3 on page 44. Table 5-3.
AP B'0000 B'0001 B'0010 B'0011 B'0100 B'0101 B'0110 B'0111 B'1000 B'1001 B'1010 Other
Access permissions implied by the APn bits
Privileged mode Read Read / Execute Read / Write Read / Write / Execute Read Read / Execute Read / Write Read / Write / Execute Read / Write Read / Write None UNDEFINED Unprivileged mode None None None None Read Read / Execute Read / Write Read / Write / Execute Read Read / Execute None UNDEFINED
5.2.1.6
MPU Control Register - MPUCR The MPUCR controls the operation of the MPU. The MPUCR has only one field: * E - Enable. If set, the MPU address checking is enabled. If cleared, the MPU address checking is disabled and no exceptions will be generated by the MPU.
5.2.2
MPU exception handling This chapter describes the exceptions that can be signalled by the MPU. ITLB Protection Violation An ITLB protection violation is issued if an instruction fetch violates access permissions. The violating instruction is not executed. The address of the failing instruction is placed on the system stack.
5.2.2.1
44
32002C-AVR32-11/07
AVR32
5.2.2.2 DTLB Protection Violation An DTLB protection violation is issued if a data access violates access permissions. The violating access is not executed. The address of the failing instruction is placed on the system stack. ITLB Miss Violation An ITLB miss violation is issued if an instruction fetch does not hit in any region. The violating instruction is not executed. The address of the failing instruction is placed on the system stack. DTLB Miss Violation An DTLB miss violation is issued if a data access does not hit in any region. The violating access is not executed. The address of the failing instruction is placed on the system stack. TLB Multiple Hit Violation An access hit in multiple protection regions. The address of the failing instruction is placed on the system stack. This is a critical system error that should not occur.
5.2.2.3
5.2.2.4
5.2.2.5
5.3
Example of MPU functionality
As an example, consider region 0. Let region 0 be of size 16 KB, thus each subregion is 1KB. Subregion 0 has offset 0-1KB from the base address, subregion 1 has offset 1KB-2KB and so on. MPUAPRA and MPUAPRB each has one field per region. Each subregion in region 0 can get its access permissions from either MPUAPRA[AP0] or MPUAPRB[AP0], this is selected by the subregion's bitfield in MPUPSR0. Let: MPUPSR0 = {0b0000_0000_0000_0000, 0b1010_0000_1111_0101} MPUAPRA = {A, B, C, D, E, F, G, H} MPUAPRB = {a, b, c, d, e, f, g, h} where {A-H, a-h} have legal values as defined in Table 5-3. Thus for region 0: Table 5-4.
Subregion 0 1 2 3 4 5 6 7
Example of access rights for subregions
Access permission h H h H h h h h Subregion 8 9 10 11 12 13 14 15 Access permission H H H H H h H h
45
32002C-AVR32-11/07
AVR32
6. Instruction Cycle Summary
This chapter presents the instructions in AVR32UC CPU, and the number of clock cycle they require to complete. All the instructions in each group behave similarly in the pipeline. The final subchapter presents code examples to illustrate the clock cycle requirements of various code constructs.
6.1
Definitions
The following definitions are presented in the tables below:
6.1.1
Issue An instruction is issued when it leaves the ID stage and enters the EX stage.
6.1.2
Issue latency The issue latency represents the number of clock cycles required between the issue of the instruction and the issue of the following instruction. For some change-of-flow instructions, this includes the cycle penalty caused by the pipeline flush. The issue latency assumes, unless stated otherwise, that the instruction and data memories are able to return an instruction or data in a single cycle, which may not be true for slow program memories or data memories mapped on the HSB bus.
6.2
6.2.1
Special considerations
PC as destination register Most instructions can take PC as destination register. This will result in a jump to the calculated address. The jump is performed when the instruction writing to PC has completed, and oll other effects of the instruction, like updating of pointer registers for loads loading to PC, have been committed. Instructions writing to PC will have an additional issue latency of 2 cycles due to the pipeline flush.
6.3
CPU revision
Revision 1 and revision 2 of the AVR32UC CPU has the same instruction timings, except that the divider in revision 2 is faster than in revision 1. Instructions only present in revision 2 of the CPU are explicitly noted.
6.4
ALU instructions
This group comprises simple single-cycle ALU instructions like add and sub. The conditional subtract and move instructions are also in this group. All instructions in this group, except ssrf to bits 15 to 31, take one cycle to execute, and the result is available for use by the following instruction. Table 6-1.
Mnemonics abs acr adc C C E
ALU instructions
Operands Rd Rd Rd, Rx, Ry Description Absolute value. Add carry to register. Add with carry. Issue latency 1 1 1
46
32002C-AVR32-11/07
AVR32
Table 6-1.
add
ALU instructions
C E Rd, Rs Rd, Rx, (Ry << sa) Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rs Rd, Rs Rd, Rs Rd, imm Rd, imm Rd Compare with carry. E Rd, Rs Rd, Rx, Ry Rd, Rx, Ry Rd Rd, Rs Reverse subtract. E Rd, Rs, k8 Rd, imm Rd, Rx, Ry Rd Rd, Rs Rd, Rx, (Ry << sa) Rd, imm Rd, imm Rd, Rs, imm Rd, Rx, Ry Rd, imm Rd, imm Rd Rd, Rs Rd, Rx, Ry << sa Rd, Rx, Ry >> sa Logical AND. Subtract signed halfwords (32 16 - 16) Subtract immediate if condition satisfied. Subtract if condition satisfied. CPU revision 2 and higher only. Test no byte equal to zero. Subtract. Reverse subtract immediate if condition satisfied. CPU revision 2 and higher only. Subtract with carry. Subtract carry from register. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Return signed maximum Return signed minimum Two's Complement. 1 1 1 1 1 E E C C Compare. Add. Add shifted. Add if condition satisfied. CPU revision 2 and higher only. Add signed halfwords (32 16 + 16) Add with absolute value. Compare byte. Compare halfword. 1 1 1 1 1 1 1 1 1 1 1
add{cond4} addhh.w addabs cp.b cp.h
E C E E E C
cp.w
C E C
cpc max min neg rsub
rsub{cond4} sbc scr
E E C C E
sub
C E E
subhh.w
C E
sub{cond4} E tnbz C C and E E
47
32002C-AVR32-11/07
AVR32
Table 6-1.
and{cond4} andn
ALU instructions
E C E Rd, Rx, Ry Rd, Rs Rd, imm Rd, imm, COH Rd, imm Rd, imm, COH Rd Rd, Rs Rd, Rx, Ry << sa Rd, Rx, Ry >> sa Rd, Rx, Ry Rd, imm Rd, imm Rd, Rs Rd, Rx, Ry << sa Rd, Rx, Ry >> sa Rd, Rx, Ry Rd, imm Rd, imm Rd, Rs Rd, Rs, o5, w5 Rd, Rs, o5, w5 Rd, Rs, o5, w5 Rd, b5 Rd Rd, b5 Rd Rd Rd Logical OR if condition satisfied. CPU revision 2 and higher only. Logical OR (High Halfword). Logical OR (Low Halfword). Test register for zero. Insert the lower w5 bits of Rs in Rd at bit-offset o5. Extract and sign-extend the w5 bits in Rs starting at bit-offset o5 to Rd. Extract and zero-extend the w5 bits in Rs starting at bit-offset o5 to Rd. Bit load. Bit reverse. Bit store. Typecast byte to signed word. Typecast halfword to signed word. Typecast byte to unsigned word. Logical (Inclusive) OR. Logical EOR if condition satisfied. CPU revision 2 and higher only. Logical Exclusive OR (High Halfword). Logical Exclusive OR (Low Halfword). Logical Exclusive OR. Logical AND if condition satisfied. CPU revision 2 and higher only. Logical AND NOT. Logical AND High Halfword, low halfword is unchanged. Logical AND High Halfword, clear other halfword. Logical AND Low Halfword, high halfword is unchanged. Logical AND Low Halfword, clear other halfword. One's Complement (NOT). 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
andh E E andl E com C C eor E E eor{cond4} eorh eorl E E E C or E E or{cond4} orh orl tst bfins bfexts bfextu bld brev bst casts.b casts.h castu.b E E E C E E E E C E C C C
48
32002C-AVR32-11/07
AVR32
Table 6-1.
castu.h cbr clz sbr swap.b swap.bh swap.h
ALU instructions
C C E C C C C E Rd Rd, b5 Rd, Rs Rd, b5 Rd Rd Rd Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd, Rx, Ry Rd, Rs, sa Rd, sa Rd Rd Rd, imm Load immediate into register. Rd, imm Rd, Rs Rd, Rs Rd, imm Rd, imm b5 b5 b5 Rd Copy register. Copy register if condition is true Load immediate into register if condition is true Load immediate into high halfword of register. CPU revision 2 and higher only. Clear status register flag. Copy status register flag to C and Z. Set status register flag. Conditionally set register to true or false Rotate left through carry. Rotate right through carry. Logical shift right. Logical shift left. Arithmetic shift right (signed). Typecast halfword to unsigned word. Clear bit in register. Count leading zeros. Set bit in register. Swap bytes in register. Swap bytes in each halfword. Swap halfwords in register. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1/3 1
asr
E C E
lsl
E C E
lsr
E C
rol ror
C C C
mov
E C E
mov{cond4} E movh csrf csrfcz ssrf sr{cond4} E C C C C
6.5
Multiply instructions
These instructions require one pass through the multiplier array and produce a 32- or 48-bit result. For mulrndhh, a rounding value of 0x8000 is added to the product producing the final
49
32002C-AVR32-11/07
AVR32
result. This group does not set any flags, except for the mulsat instructions which set Q if saturation occurred. Table 6-2.
Mnemonics E mul E mulhh.w mulnhh.w mulnwh.d mulwh.d E E E E Rd, Rs, imm Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry
Multiply instructions
Operands Rd, Rx, Ry Description Multiply. (32 32 x 32) Multiply immediate. Signed Multiply of halfwords. (32 16 x 16) Signed Multiply of halfwords. (32 16 x 16) Signed Multiply, word and halfword. (48 32 x 16) Signed Multiply, word and halfword. (48 32 x 16) Fractional signed multiply with saturation. Return halfword. (16 16 x 16) Fractional signed multiply with saturation. Return word. (32 16 x 16) Fractional signed multiply with saturation. Return word. (32 32 x 16) Signed multiply with rounding. Return halfword. (16 16 x 16) Signed multiply with rounding. Return halfword. (32 32 x 16) Issue latency 1 1 1 1 1 1
mulsathh.h
E
1
mulsathh.w
E
1
mulsatwh.w
E
Rd, Rx, Ry
1
mulsatrndhh.h
E
Rd, Rx, Ry
1
mulsatrndwh. w
E
Rd, Rx, Ry
1
6.6
MAC instructions
These instructions require one pass through the multiplier array and produce a 32- or 48-bit result. This result is added to an accumulator register. A valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue and result latencies depend on whether the accumulator is cached in the AccCache.
50
32002C-AVR32-11/07
AVR32
This group does not set any flags, except for the macsathh.w instruction which set Q if saturation occurred. Table 6-3.
Mnemonics mac machh.w machh.d E E E
MAC instructions
Operands Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Description Multiply accumulate. (32 32x32 + 32) Multiply signed halfwords and accumulate. (32 16x16 + 32) Multiply signed halfwords and accumulate. (48 16x16 + 48) Multiply signed word and halfword and accumulate. (48 32 x 16 + 48) Fractional signed multiply accumulate with saturation. Return word. (32 16 x 16 + 32) Issue latency 1/2 1/2 1/2
macwh.d
E
1/2
macsathh.w
E
Rd, Rx, Ry
1/2
6.7
MulMac64 instructions
These instructions require two passes through the multiplier array to produce a 64-bit result. For macs.d and macu.d, a valid copy of this accumulator may be cached in the accumulator cache. Otherwise, an extra cycle is needed to read the accumulator from the register file. Therefore, issue and result latencies depend on whether a valid entry is found in the accumulator cache. Table 6-4.
Mnemonics macs.d macu.d muls.d mulu.d E E E E
MulMac64 instructions
Operands Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Description Multiply signed accumulate. (64 32x32 + 64) Multiply unsigned accumulate. (64 32x32 + 64) Signed Multiply. (64 32 x 32) Unsigned Multiply. (64 32 x 32) Issue latency 3/4 3/4 2 2
51
32002C-AVR32-11/07
AVR32
6.8 Divide instructions
These instructions require several cycles in the EX stage to complete. The divs and divu instructions will be aborted immediately if any interrupts are pending, in order to limit the interrupt latency. The divide instructions are faster in revision 2 than in revision 1 of the AVR32UC CPU. Table 6-5.
Mnemonics divs E
Divide instructions
Operands Rd, Rx, Ry Description Divide signed. (32 32/32) (32 32%32) Divide unsigned. (32 32/32) (32 32%32) Issue latency 191
divu
E
Rd, Rx, Ry
191
1.) 35 cycles in revision 1 of the CPU
6.9
Saturate instructions
These instructions perform arithmetic operations with possible saturation. Table 6-6.
Mnemonics satadd.h satadd.w satsub.h satsub.w E satrnds satrndu sats satu E E E E Rd, Rs, imm Rd >> sa, b5 Rd >> sa, b5 Rd >> sa, b5 Rd >> sa, b5 Signed saturate from bit given by sa after a right shift with rounding of b5 bit positions. Unsigned saturate from bit given by sa after a right shift with rounding of b5 bit positions. Shift sa positions and do signed saturate from bit given by b5. Shift sa positions and do unsigned saturate from bit given by b5. E E E E
Saturate instructions
Operands Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Rd, Rx, Ry Saturated subtract. 1 2 2 1 1 Description Saturated add halfwords. Saturated add. Saturated subtract halfwords. Issue latency 1 1 1 1
6.10
Load and store instructions
This group includes all the load and store instructions. The address calculations are performed by the adder in the EX stage. The EX adder also performs the writeback address calculation for the autoincrement and autodecrement operation. Loaded data are available at the end of the cycle in the EX stage. Byte and halfword data must be extended and rotated before they are valid. This is performed in the EX stage. Ldins and ldswp instructions also require modification in the EX stage before their results are valid. Stswp instructions require modification before their data is output to the memory interface. This modification is performed in the EX stage. 52
32002C-AVR32-11/07
AVR32
The stcond instruction takes 2 cycles if the store is not performed, 3 cycles if the store is performed. All issue latencies are given for accesses to IRAM or LOCAL. These timings must be modified as follows for accesses to BOOT or HSB sections: * A byte, halfword or word load requires 1+w cycles in addition to the count listed in Table 6-7, where w is the number of wait states from the slave and bus system. The pipeline will stall during these cycles. * A doubleword load performs two memory accesses, so 2(1+w) cycles are needed in addition to the count listed in Table 6-7. The pipeline will stall during these cycles. * A byte, halfword or word store requires (1+w) cycles in addition to the count listed in Table 67, where w is the number of wait states from the slave and bus system. Stores to BOOT or HSB can be performed in the background, so the pipeline will only stall if another memory access is attempted during these w cycles. However, multiple stores to addresses in BOOT or HSB can be automatically combined by the memory interface to create bursts on the HSB bus. This means that any consecutive stores to BOOT or HSB sections will not stall the pipeline unless the bus itself inserts wait cycles, for example due to wait states or bus contention. Instructions not performing memory accesses will never stall the pipeline when executed after stores to BOOT or HSB. * A doubleword store performs two memory accesses, but these will be pipelined. The last of these accesses will stall if the instruction following the doubleword is a memory access instruction other than a store to BOOT or HSB. Therefore, a non-memory instruction or another store to BOOT or HSB should be scheduled after a doubleword store to BOOT or HSB for maximum performance.
Table 6-7.
Load and store instructions
Issue latency IRAM 2 2 1 Load unsigned byte with displacement. E E Rd, Rp[disp] Rd, Rb[Ri<Mnemonics C C ld.ub C
Operands Rd, Rp++ Rd, --Rp Rd, Rp[disp]
Description Load unsigned byte with post-increment. Load unsigned byte with pre-decrement.
ld.ub{cond4}
E E
ld.sb E ld.sb{cond4} E
53
32002C-AVR32-11/07
AVR32
Table 6-7. Load and store instructions
C C ld.uh C E E ld.uh{cond4} E C C ld.sh C E E ld.sh{cond4} E C C C ld.w E E E ld.w{cond4} E C C ld.d C E E ldins.b ldins.h ldswp.sh ldswp.uh ldswp.w lddpc E E E E E C Rd, PC[disp] Rd, Rp[disp] Rd, Rp++ Rd, --Rp Rd, Rp[disp] Load unsigned halfword with displacement. Rd, Rp[disp] Rd, Rb[Ri< << 2] Rd, Rp[disp] Rd, Rp++ Rd, --Rp Rd, Rp Rd, Rp[disp] Rd, Rb[Ri<, Rp[disp] Rd, Rp[disp] Indexed Load word. Load word with extracted index. Load word with displacement if condition satisfied. CPU revision 2 and higher only. Load doubleword with post-increment. Load doubleword with pre-decrement. Load doubleword. Load double with displacement. Indexed Load double. Load byte with displacement and insert at specified byte location in Rd. Load halfword with displacement and insert at specified halfword location in Rd. Load halfword with displacement, swap bytes and sign-extend. Load halfword with displacement, swap bytes and zero-extend. Load word with displacement and swap bytes. Load with displacement from PC. 1 1 1 1 3 3 2 2 2 1 1 1 1 1 1 Indexed Load signed halfword. Load signed halfword with displacement if condition satisfied. CPU revision 2 and higher only. Load word with post-increment. Load word with pre-decrement. 1 1 1 2 2 1 Indexed Load unsigned halfword. Load unsigned halfword with displacement if condition satisfied. CPU revision 2 and higher only. Load signed halfword with post-increment. Load signed halfword with pre-decrement. 1 1 1 2 2 1 Load unsigned halfword with post-increment. Load unsigned halfword with pre-decrement. 2 2 1
54
32002C-AVR32-11/07
AVR32
Table 6-7.
lddsp
Load and store instructions
C C C Rd, SP[disp] Rp++, Rs --Rp, Rs Rp[disp], Rs Store byte with displacement. E E Rp[disp], Rs Rb[Ri<st.b
C
st.b{cond4}
E C C
st.d
C E E C C
st.h
C
st.h{cond4}
E C C
st.w
C
st.w{cond4} stcond stdsp
E E C E
sthh.w E stswp.h stswp.w E Rp[disp], Rs E
55
32002C-AVR32-11/07
AVR32
6.11 Multiple data memory access instructions
These instructions perform multiple data accesses. The incremental accesses are performed as word accesses. The number of cycles is dependent on the number of registers to load or store, n. The issue latency must be modified as follows: * LDM and POPM will use an additional cycle if testing of R12 is performed * LDM and POPM that updates PC will cause a change-of-flow, which is performed in parallel with the pointer writeback and therefore has a penalty of only one cycle. * The issue latency for HSB accesses increases if the HSB bus is busy or the slave inserts wait states. The instructions in this group will be aborted immediately if any interrupts are pending, in order to limit the interrupt latency. Table 6-8. Multiple data memory accesses
Issue latency IRAM 1+n 2+n 1+n 2+n 2+n 2+n 1+n 2+n 1+n 2+n Issue latency HSB 1+2n 1+2n 1+2n 1+2n 1+2n 3+n 2+n 3+n 2+n 3+n
Mnemonics ldm ldm ldmts ldmts popm pushm stm stm stmts stmts E E E E C C E E E E
Operands Rp, Reglist16 Rp++, Reglist16 Rp, Reglist16 Rp++, Reglist16 Reglist8 Reglist8 Rp, Reglist16 --Rp, Reglist16 Rp, Reglist16 --Rp, Reglist16
Description Load multiple registers. R12 is tested if PC is loaded. Load multiple registers. R12 is tested if PC is loaded. Load multiple registers for task switch. Load multiple registers for task switch. Pop multiple registers from stack. R12 is tested if PC is popped. Push multiple registers to stack. Store multiple registers. Store multiple registers. Store multiple registers for task switch. Store multiple registers for task switch.
56
32002C-AVR32-11/07
AVR32
6.12 Branch instructions
The branch instructions cause a pipeline flush and change-of-flow if taken. Two cycles must be added to the issue latency if the branch is taken. Table 6-9.
Mnemonics br{cond3} br{cond4} rjmp ret{cond4} C E C C
Branch instructions
Operands disp Branch if condition satisfied. disp disp Rs Relative jump. Conditional return from subroutine with move and test of return value. 1 1 1 Description Issue latency 1
6.13
Call instructions
Call instructions behave similarly to branches, except that the link register (LR) must be updated. The issue latency presented in the table includes the branch penalty. The breakpoint instruction takes a single cycle if Debug mode is disabled, in this case it executes as a nop. The breakpoint instruction updates RAR_DBG instead of LR. Table 6-10.
Mnemonics acall icall mcall rcall E scall breakpoint C C disp Supervisor call Breakpoint. C C E C
Call instructions
Operands disp Rd Rp[disp] disp Relative call. 4 4 3 Description Application call Register indirect call. Memory call. Issue latency 4 4 4 4
6.14
Return from execution mode instructions
The rete and rets instruction may pop the status register and return address from the system stack, and perform a branch to the return address. The retd instruction gets the return address and return status registers from the RAR_DBG and RSR_DBG system registers. The issue latency presented in the table includes the branch penalty. The rete instruction has a latency of 12 cycles when returning from INT0-INT3 modes, 5 cycles otherwise. The rete instruction can be aborted by a pending interrupt. Table 6-11.
Mnemonics retd rete rets C C C
Return from execution mode instructions
Operands Description Return from debug mode Return from exception Return from supervisor call Issue latency 3 5 / 12 5
57
32002C-AVR32-11/07
AVR32
6.15 Swap instructions
The swap instruction performs two atomical memory accesses, first one read and then one write. Table 6-12.
Mnemonics xchg E
Swap instructions
Operands Rd, Rx, Ry Description Exchange register and memory. Issue latency 2
6.16
System register instructions
This group moves data to and from the system registers. Accesses to system registers are performed in the EX stage, taking one cycle. MTSR to SREG takes 3 cycles, MTSR to all other system registers takes 1 cycle. Table 6-13.
Mnemonics mfdr mfsr mtdr mtsr musfr mustr E E E E C C
System register instructions
Operands Rd, SysRegNo Rd, SysRegNo SysRegNo, Rs SysRegNo, Rs Rs Rd Description Move debug register to Rd. Move system register to Rd. Move Rs to debug register. Move Rs to system register. Move Rs to status register. Move status register to Rd. Issue latency 1 1 1 1/3 1 1
6.17
System control instructions
This group contains simple single-cycle instructions that control the behaviour of different parts of the system. The frs, pref and sync instructions are executed as NOP in AVR32UC. Table 6-14.
Mnemonics frs pref sleep sync C E E E Rp[disp] Op8 Op8
System control instructions
Operands Description Invalidate the return address stack. Prefetch cache line. Enter SLEEP mode. Flush write buffer. Issue latency 1 1 1 1
6.18
Read-modify-write instructions
This group contains instructions that perform atomical bit-operations on memory addresses. These instructions require multiple cycles inside the memory controller, but these can be performed in parallel with subsequent instructions if the following instructions are not memory access instructions. A RMW instruction performed on an address in the IRAM section executes in a single cycle if the IRAM write buffer is empty. If the write buffer is not empty, two cycles are required. The pro-
58
32002C-AVR32-11/07
AVR32
grammer can make sure the buffer is empty by ensuring that the instruction immediately before the RMW instruction is not a store or another RMW instruction. If the RMW instruction is performed on an address in the HSB section, four cycles are needed for the RMW instruction to be executed. Therefore, if another instruction attempts to access memory within one of the three following clock cycles, up to three stall cycles will be inserted. If a memory access instruction is scheduled less than 3 cycles after the RMW to HSB instruction, 3-n stall cycles are inserted. Here n is the number of cycles used by instructions between the RMW instruction and the first memory access instruction. RMW operations to the HSB section will take additional cycles if the HSB inserts wait states. When using RMW instructions, try to schedule code so that stall cycles are avoided. Table 6-15.
Mnemonics memc mems memt E E E
Read-modify-write instructions to IRAM section
Operands imm, bp imm, bp imm, bp Description Clear bit in memory. Set bit in memory. Toggle bit in memory. Execution cycles IRAM 1/2 1/2 1/2 Execution cycles HSB 4 4 4
6.19
6.19.1
Code example
Assumptions In the example code given in this chapter, the following assumptions are made: * r0 points to an address in the IRAM space. IRAM is an alias for r0. * r1 points to an address in the HSB or BOOT space. HSB is an alias for r1. * All memories and buses have 0 wait state access. * The CPU is in a priviliged mode, so that no privilege violations occur * All instructions are executed in the precise sequence shown below
Instruction add r5, r0 sub r5, r5 ssrf AVR32_SREG_C ssrf AVR32_SREG_GM max r6, r1, r0 mul r5, r1 mac r5, r1 mac r3, r1 mac r3, r1 macs.d r2, r1, r2 mulwh.d r6, r5, r1:t st.w IRAM[0], r6 divs r4, r5, r6 Cycles 1 1 1 3 1 1 1 2 1 4 1 1 35 1 cycle since r5 is already in the accumulator cache 2 cycles since r3 is not in the accumulator cache 1 cycle since r3 is not in the accumulator cache 4 cycles since register pair r3:r2 is not in the accumulator cache 48 bit result calculated and written back in 1 cycle SSRF to bits 31-16 takes 3 cycles Description
59
32002C-AVR32-11/07
AVR32
Instruction ld.w r8, IRAM++ satadd.w r4, r8, r9 ld.w r4, IRAM[4] add r4, r4 ld.w r5, IRAM[8] ld.w r6, IRAM[12] ldm IRAM++, r5, r6, r7, r8 mfsr r8, AVR32_SREG cbr r8, 0 mtsr AVR32_SREG, r8 st.w IRAM[0], r5 st.w IRAM[4], r6 nop ld.w r5, HSB[0] ld.w r6, HSB[4] st.w HSB[8], r6 add r5, r6 and r7, r8 st.w HSB[8], r6 st.w HSB[12], r7 st.w HSB[16], r8 add r5, r6 ldm HSB++, r5, r6, r7, r8 stm HSB, r5, r6, r7, r8 add r6, r5 memc IRAM, 3 st.w IRAM[4], r8 memc IRAM, 3 mul r5, r9 memc HSB, 7 ld.w r4, IRAM[16] memc HSB, 7 sub r7, r4 mul r6, r9 or r5, r8 Cycles 2 1 1 1 1 1 5 1 1 3 1 1 1 2 2 1 1 1 2 1 1 1 9 6 1 1 1 2 1 4 1 1 1 1 1 Memory not accessed in the 3 following clock cycles, no stall Next instruction has a memory access, 3 stall cycles needed Stores to IRAM can be adjacent without any stalling nop takes 1 cycle Reading from memories on the bus takes 2 cycles HSB bus reads are not pipelined, each read takes 2 cycles HSB store done in background if 2 next insn is not mem access Nonmem insn scheduled after HSB store to avoid stall Nonmem insn scheduled after HSB store to avoid stall First of consecutive HSB stores requires extra cycle to start Consecutive HSB stores are pipelined. Consecutive HSB stores are pipelined. Consecutive HSB stores followed by 1 nonmem insn do not stall 1+n=1+2r where r=#regs when reading from HSB addresses 1+n=1+(1+r) where r=#regs when writing to HSB addresses Instruction not using the write buffer Requires only one cycle because write buffer was empty Instruction filling the write buffer Requires 2 cycles because write buffer was not empty mtsr to SREG takes 3 cycles, 1 cycle required for other sysregs Loads from IRAM can be adjacent without any stalling ldm from IRAM takes 1+n= 5 cycles when loading 4 registers No data hazard after loads from IRAM Description Load with postincrement takes two cycles
60
32002C-AVR32-11/07
AVR32
7. OCD system
7.1 Overview
The AVR32 CPU is targeted at a wide range of 32-bit applications. The CPU can be delivered in very different implementations in various ASIC's, ASSP's, and standard parts to satisfy requirements for low-cost as well as high-speed markets. According to the cost sensitivity and complexity of these applications, a similar span in debug complexity must be expected. While some users expect very simple debug features, or none at all, others will demand full-speed trace and RTOS debug support. This also applies to the debug tools: While the simplest development takes place on simulators and development boards, most will require basic on-chip debug emulators, and a few will require complex emulators with full-speed trace. To match these criteria, the AVR32 OCD system is designed in accordance with the Nexus 2.0 standard (IEEE-ISTO 5001TM-2003), which is a highly flexible and powerful open on-chip debug standard for 32-bit microcontrollers. 7.1.1 Features * Nexus compliant debug solution * OCD supports any CPU speed * Execute debug specific CPU instructions (debug code) from program memory monitor or external debugger * Debug code can read and write all registers and data memory * Debug code can communicate with debugger through the debug port * Debug mode can be entered by external command, breakpoint instruction, or hardware breakpoints * Six program counter hardware breakpoints are supported * Two data breakpoints are supported * Breakpoints can be configured as watchpoints (flagged to the external debugger) * Hardware breakpoints can be combined to give break on ranges * Real-time program counter branch tracing * Real-time data trace * Real-time process trace * Nexus Class 2+ 7.1.2 OCD controller overview The OCD system interfaces provides the external debugger with access to the on-chip debug logic through the JTAG port and the Auxiliary (AUX) port, as shown in Figure 7-1. The operation is described briefly below and in more detail in separate chapters. Host, debugger, and emulator At the host side, the user debugs his software using a source level debugger, which can read his compiled and linked object code. The source level debugger accesses features in the emulator and OCD system through an API (defined by the vendor or based on the Nexus recommendations), which constitutes the abstract interface between the source level debugger and the emulator. The API translates high-level functions, such as setting breakpoints or reading memory areas, to sets of low level commands understood by the OCD controller. Certain operations
7.1.2.1
61
32002C-AVR32-11/07
AVR32
(such as reading the register file) may require running sections of debug code on the CPU, which can also be handled in this level. The emulator translates the communication from the host into commands transmitted to the target over the JTAG port. If trace is enabled, trace messages are transmitted from the device on the Nexus-defined auxiliary (AUX) port. The AUX port can be scaled to the number of output pins needed to sustain the estimated bandwidth requirement. The Nexus protocol defines the format of the messages and signals, the pin count options and pinout of the debug port, and the type of connector used. Figure 7-1. Block diagram of the OCD system (shaded) and its main connections.
TAP
Service Access Port (SAP)
AUX Port OCD system JTAG Port
Emulator
Host
Transmit Queue
W atchpoint msg Data Trace msg
Service Access Bus (SAB)
Debug Status msg Trigger
CPU observation units
Data Trace Program Trace PC Comparators
Branch Trace Message
Trigger
Ownership Trace Message
Flow Control Unit
Breakpoint
Breakpoint Unit Data Comparators
Ownership Trace Unit
OCD Debug control inst signals
CPU observation signals
CPU
7.1.2.2 Accessing the debug features A number of blocks handle the various debug functions specified by the Nexus standard. The emulator communicates with registers in these blocks by commands on the JTAG port, as specified by the Nexus standard. OCD registers are typically used for configuration, control, and status information. Trace information and debug events can also generate messages to be transmitted on the AUX port. Registers are indexed and are accessed through Read Register and Write Register messages from the emulator. Alternatively, they can be accessed by the CPU through mtdr and mfdr instructions, which gives a debug monitor in the CPU access to most of the debug features in the OCD system, as described in "OCD Register Access" on page 72.
62
32002C-AVR32-11/07
AVR32
7.1.2.3 Transmit Queue Trace and watchpoint messages are inserted into the Transmit Queue (TXQ) before being transmitted on the AUX port. This provides some flexibility between the peak rate of trace message generation and the average rate of message transmission on the AUX port. Flow Control Unit The Flow Control Unit (FCU) can bring the CPU into and out of Debug Mode, and control the CPU operation in Debug Mode. The behavior is controlled by accessing OCD registers. Debug Mode can be configured as OCD Mode or Monitor Mode. In OCD mode, The CPU fetches instructions from the Debug Instruction Register. If the register is empty, the CPU is halted. In Monitor Mode, the CPU fetches debug instructions from a monitor code in the program memory, and the Debug Instruction Register is not used. The FCU also handles single stepping by returning the CPU to normal mode, letting the CPU fetch one instruction from the program memory, and then returning to Debug Mode on the following instruction. 7.1.2.5 Breakpoint modules A number of instruction and data breakpoint modules can be configured for run-time monitoring of the instruction fetches and data accesses by the CPU. The modules can report if the monitored operation matches a predefined address, alternatively, also a data value. The modules operate on virtual addresses. A breakpoint will bring the CPU into Debug Mode. Watchpoints are reported to the debugger, but does not affect CPU operation. A watchpoint can also be configured to start or stop data and program trace. The breakpoint modules can be combined to produce a watchpoint or breakpoint. Complex breakpoint/watchpoint conditions are supported, e.g. trigger when a specific procedure writes a certain variable with a specific value. 7.1.2.6 Program and Data Trace The Program Trace Unit sends Branch Trace Messages to the debugger, which allows the program flow to be reconstructed. To keep the amount of debug information low to save bandwidth, only change of program flow are reported (such as unconditional branches, taken conditional branches interrupts, exceptions, return operations, and load operations with PC as destination), hence the term "branch tracing". Messages are typically relative to the previously transmitted message, to be able to compress information as much as possible. Thus, the trace messages are sent out in temporal order, and regularly, synchronization messages with uncompressed, absolute addresses, are transmitted in case synchronization is lost. The Data Trace Unit similarly traces data accesses, for read or write accesses, or both. Similar relative address compression and synchronization schemes are used for Data Trace Messages. Since new trace messages can be generated before the previous ones have been transmitted, all trace messages are queued before being transmitted by the AUX interface. If the queue overflows, the CPU can be halted to avoid losing trace information, or an error message followed by synchronization trace messages will be transmitted. 7.1.2.7 OS debug support Applications developed on an OS platform places special requirements on the OCD controller and the debug software. For high-level debugging, the user will want to see which process is
7.1.2.4
63
32002C-AVR32-11/07
AVR32
running at any time, without having to interrupt the CPU or trace the program flow. This is accomplished through Ownership Trace Messaging, in which the process ID of the running process is reported at every process switch. The CPU writes the process ID to an OCD register in the Ownership Trace Unit, which in turn generates an Ownership Trace Message. 7.1.2.8 Timestamps The emulator can tag events with a timestamp when they are extracted from the OCD system and transmitted to the emulator, to provide timing information for these events when they are transmitted to the debug host. However, due to the delay of the transmit queue and transmit time over the AUX port, this timing will have limited accuracy. To compensate for this, the EVTO pin can be configured to toggle every time a message is inserted into the Transmit Queue, thus indicating very precisely when each event occurs. The emulator would then store a queue of timestamp tags with each event, and associate each tag with the corresponding message, as they are extracted on the AUX port.
7.2
CPU Development Support
The OCD system can bring CPU into and out of Debug Mode, and control the CPU operation in Debug Mode. The behavior is controlled by OCD register configuration, stop commands from the debugger, or breakpoints. The OCD registers can be accessed by Nexus messages or from the CPU as memory-mapped registers.
7.2.1
Debug Mode Debug Mode is an execution mode dedicated to application debugging and is not intended for running application code. Debug Mode can execute a debug code either from an external debugger through the OCD system (OCD Mode), or from a debug routine in program memory (Monitor Mode). The debug code will typically read out system registers and information about the various processes running in the system before restarting. The Nexus class 2+ compliant OCD system contains breakpoint and trace modules, and other features for debugging code on the CPU. These features are generally accessible both in OCD Mode and Monitor Mode. In OCD Mode, the debugger accesses the features through messages over the AUX debug port, and in Monitor Mode, the CPU accesses the features through mtdr and mfdr instructions. The OCD system runs at system speed to stay synchronous with the CPU at all times. If the CPU is in a low-power sleep mode, it is woken up before entering Debug Mode.
7.2.1.1
Operations in Debug Mode Debug Mode is characterized by the Debug (D) bit in the Status Register (SR) in the CPU. Debug Mode is a privileged mode, and all legal instructions and memory operations are permitted Illegal opcodes or memory operations which would normally cause an exception will be ignored in Debug Mode. The Debug Mode has a dedicated Link and Return Status Register (RAR_DBG and RSR_DBG, respectively) but no other masked registers. RAR_DBG and RSR_DBG are not observable as part of the register file, only as system registers. The register file view is mapped according to the mode bits in the Status Register (M[2:0]). These bits are set to the exception context when entering Debug Mode, but can be changed freely within Debug Mode by writing to SR. In this way, different register contexts can be observed and modified, while maintaining the execution and access privileges of Debug Mode.
64
32002C-AVR32-11/07
AVR32
Debug Mode is exited by the retd instruction, both in Monitor Mode and OCD Mode. This restores PC from RAR_DBG and SR from RSR_DBG. 7.2.1.2 A typical debug session flow Figure 7-2 shows an example of a typical flow in Debug Mode. A software or hardware breakpoint aborts the execution of an instruction and causes Debug Mode to be entered. If the Monitor Mode (MM) bit in the Development Control (DC) OCD register is set, Monitor Mode is entered, and the CPU jumps to the software debug monitor starting at EVBA+0x01C. Otherwise, OCD Mode is entered, and the CPU stalls while waiting for instructions to be entered by the external debugger through the Debug Instruction (DINST) OCD register. In either case, the D bit in the CPU Status Register is set during the whole debug session, giving access to all privileged operations. Any number of instructions can be executed before returning to the breakpointed instruction by the retd instruction. RAR_DBG stores the address of the breakpointed instruction, and manipulating RAR_DBG in Debug Mode is useful if a different return address is desired (for instance, to avoid repeated hits on a breakpoint instruction). Figure 7-2. Example of flow in Debug Mode.
User code
Debug Mode Breakpointed instruction
LR_DBG
DC:MM?
External Debugger
0 = OCD Mode
1 = Monitor Mode
Write Register commands EV BA +0x300
DINST
Inst
Instructions from externaldebugger SR:D = 1
Software debug monitor SR:D = 1
retd
retd
7.2.2
Monitor Mode If the Monitor Mode (MM) bit in the Development Control register (DC) is set, the CPU will enter Debug Mode in Monitor Mode. Instructions are fetched from the monitor code located in the program memory at the Exception Vector Base Address (EVBA) + 0x01C. The monitor code contains the necessary mechanisms to read and modify CPU and system registers, and memory
65
32002C-AVR32-11/07
AVR32
areas. All other exceptions and interrupts are masked by default when entering Monitor Mode, but the monitor code can explicitly unmask interrupts to allow critical interrupts to be serviced while the system is being debugged. The monitor code will typically communicate with an external debug tool, or (in cases of advanced systems like PDA's) a debug tool running within the application (self-hosted debugger). Communication with the external tool may take place over any communication link present in that device (e.g. USB, RS232), if such a communication line can be reserved for debug purposes. Alternatively, the Debug Communication Mechanism in the OCD system can be used to communicate between the CPU and emulator over the JTAG port. This is a set of OCD registers which can be written by the CPU or emulator, allowing a communication protocol to be developed in software. This mechanism can be used in any privileged CPU mode, including OCD Mode. Monitor Mode is exited with the retd instruction. 7.2.2.1 Debugging a monitor code Each execution mode has a mask bit in SR, which indicates if a request to enter that mode will be taken or masked. The default priority of modes are reflected in these bits: When entering an execution mode, modes of the same or lower priority are masked. Privileged modes can override the mask, to dynamically change priorities (e.g. to allow critical interrupts to be serviced). By default, Debug Mode has priority above all other execution modes. This implies that any supervisor or user code can be interrupted by Debug Mode. Other modes can be explicitly unmasked by a monitor code to allow critical interrupts to be serviced. By default, Debug Mode is masked by the Debug Mask (DM) bit in SR when executing in Monitor Mode. The Monitor Mode can stack away the RAR_DBG and RSR_DBG and then explicitly clear the DM bit to enable Debug Mode to be re-entered. If a debug exception occurs in Monitor Mode, the OCD system will bring the CPU into OCD Mode, even if the MM bit is set. This allows Monitor Mode programs to be debugged. 7.2.3 OCD Mode If the Monitor Mode (MM) bit in the Development Control register (DC) is cleared, the CPU will enter Debug Mode in OCD Mode. When the CPU is in OCD Mode, the Debug Status (DBS) bit in the Development Status (DS) register is set, in addition to the D bit in SR in the CPU. OCD Mode is similar to Monitor Mode, except that instructions are fetched from the OCD system. OCD instructions are loaded by the debug tool by writing the opcode to the Debug Instruction register (DINST). Once an instruction is written to DINST, the CPU will fetch it, and the Instruction Complete bit in DS (DS:INC) will be cleared until the CPU has completed the operation. The CPU is then halted until DINST is written again. The first instruction entered must be aligned to the MSB of DINST. A sequence of instructions can be entered to DINST one word at a time, in the same sequence they would appear in program memory, i.e. they do not need to be word aligned. If the upper halfword of an extended instruction is written to the lower halfword of DINST, the lower halfword of the instruction is written as the upper halfword of DINST in the next access. If the last instruction in a sequence is written to the upper halfword of DINST, the lower halfword should be written with a nop opcode. See Figure 7-3 for an illustration of a sequence of operations used to execute instructions in OCD Mode.
66
32002C-AVR32-11/07
AVR32
Any instruction valid in Monitor Mode is also valid in OCD Mode. Memory operations can be conducted without any special synchronization with external hardware. All OCD units can be configured while the CPU executes in OCD Mode, but the following debug features are disabled: * PC breakpoints * Data breakpoints * Watchpoints * Program Trace * Data Trace OCD Mode is exited by writing the retd instruction to DINST. Figure 7-3. Executing instructions on the CPU in OCD Mode.
OCD Instructions mov r12,r7 sub r12,0x01
Opcode 0x0E9C 0x201C 0x1896 0xF807 0046 0xD623
Written by tool to DINST 0x0E9C201C
Changes in DS INC01
mov r6,r12 adc retd r6,r12,r7
0x1896F807 0x0046D623
INC01 INC01 DBS0
7.2.4
Entry into Debug Mode Debug Mode can only be entered when the OCD is enabled, and Debug Mode is not masked. The following ways of entry are then possible: * Debug request from the debugger * Program counter breakpoint * Data address or value breakpoint * breakpoint instruction * Trapping opcode 0x0000 * Single step * Event on EVTI pin * Abort command from the debugger The debugger can identify the condition which caused entry into Debug Mode by examining the status bits in the Development Status register (DS). Each cause of entry has a particular bit associated with it. Several exceptions can trigger simultaneously, causing more than one bit to be set. Note that any privileged CPU mode may write the SR:D bit to one directly, but this will not cause entry to Debug Mode.
67
32002C-AVR32-11/07
AVR32
7.2.4.1 Debug request The debugger may want to stop CPU operation, unrelated to current instruction execution, e.g. if the user presses a "STOP" button in the debug tool GUI. The debugger will then write the Debug Request (DBR) bit in the Development Control Register (DC). This causes the CPU to enter Debug Mode on the next instruction to be executed, before execution. Program counter breakpoint The Program Counter breakpoints can be configured to halt the CPU when executing code at a specific address, or address range. This will cause the CPU to be halted before the breakpointed instruction is executed. The Ignore First Match (IFM) bit in the Development Control (DC) register should be written to one before exiting Debug Mode, to avoid re-triggering the program breakpoint. This bit only prevents program breakpoints from re-triggering. If the instruction causes a breakpoint for another reason (e.g. a breakpoint instruction or a data breakpoint), Debug Mode will be re-entered. 7.2.4.3 Data address or value breakpoint CPU memory accesses can be monitored by data breakpoint comparators in the OCD system. If the access matches a set of predefined conditions (e.g. address, value, or access type), Debug Mode is entered after the memory operation completes, but before the next instruction is executed. Data breakpoints are precise, halting on the instruction immediately after the memory operation which caused the breakpoint. The CPU will return to the first non-executed instruction when a retd is executed. 7.2.4.4 breakpoint instruction The breakpoint instruction is programmed along with the object code into the program memory or instruction cache, and is decoded by the CPU. When this instruction is scheduled for execution and Debug Mode is enabled, the CPU will enter Debug Mode. If Debug Mode is disabled (e.g. masked by the DM bit in the Status Register, or DBE in DC is zero), the breakpoint instruction will execute as a nop (no operation). For devices based on volatile program memory, the breakpoint instruction can be dynamically inserted into the code by the debug tool, enabling an unlimited number of program breakpoints in the code. This involves replacing an existing opcode with a breakpoint instruction. The replaced opcode has to be re-inserted before exiting Debug Mode. Note that this is only possible in OCD Mode. For devices based on non-volatile program memory, the breakpoint instruction can be statically compiled or linked into the code before downloading, marking all points the program can be halted. Debug Mode will be entered for all breakpoints (if Debug Mode is enabled), and the debugger would return immediately if it does not want to halt at a particular breakpoint location in the code. The breakpoint will be taken before the breakpoint instruction is actually executed. This has the effect that the CPU will return from Debug Mode to the same breakpoint instruction, re-entering Debug Mode immediately, unless the OCD system is configured to modify the return address or replace the breakpoint instruction from the instruction flow. The IFM bit does not have an effect when Debug Mode returns to a breakpoint instruction.
7.2.4.2
68
32002C-AVR32-11/07
AVR32
7.2.4.5 Trapping opcode 0x0000 In Flash-based microcontrollers, the opcode 0x0000 can overwrite any other opcode without having to erase and reprogram the Flash. Therefore this instruction can enter Debug Mode, as for the breakpoint instruction. However, the opcode 0x0000 is also a valid part of the instruction set (ADD R0,R0 in AVR32) and can be part of the software to be debugged. Therefore, the user must write the DC:TOZ (Trap Opcode Zero) bit to one to enable this feature. The DS:BOZ bit will be set if Debug Mode is entered due to a trapped 0x0000 instruction. The debugger must then identify whether this opcode belongs to the original object file or has been inserted by the debugger as a software breakpoint. If it was part of the object file, the debugger should use the Instruction Replacement to return to the program, and insert the 0x0000 opcode in DINST. Executing 0x0000 during Instruction Replacement only performs an ADD R0,R0 operation without re-entering Debug Mode. 7.2.4.6 Single stepping The debugger will typically allow the user to step through the application source or object code, line by line. This single stepping can be either of step-into or step-over type. Step-into will execute exactly one instruction and halt the CPU at the start of the next instruction, regardless of whether this instruction is part of the main program, subroutine, interrupt, or exception. Stepover will execute the current instruction and any lower-level events generated before the following instruction (including subroutines, interrupts, and exceptions). Step-over in the object code and all single stepping in the source code are implemented by configuring a program breakpoint on the address of the next object code instruction where the debugger expects to halt.s Step-into is implemented in OCD hardware and is controlled by the Single Step (SS) bit in the Development Control register. When Debug Mode is exited by retd, exactly one instruction from the program memory will be executed before Debug Mode is re-entered. This mechanism works identically for OCD and Monitor Mode. 7.2.4.7 Event on EVTI pin If the Event In Control (EIC) bits in DC are written to 0b01, a high-to-low transition on the EVTI pin will generate a breakpoint. EVTI must stay low for one CPU clock cycle to guarantee that the breakpoint will trigger. The External Breakpoint (EXB) bit in DS will be set when a breakpoint is entered due to an event on the EVTI pin. Abort command Some software errors could cause the CPU to get stuck in a state which does not allow Debug Mode to be entered through the mechanisms described above. An example is if a privileged mode writes SR:DM to one, without clearing the bit. To prevent the debugger from hanging indefinitely, the debugger can write the DC:ABORT bit to one after some timeout period, and force the CPU to enter Debug Mode. The abort command behaves identical to a debug request, except that the DM bit and any pending exception will be ignored, regardless of exception priority. The RAR_DBG and RSR_DBG will reflect the last nonexecuted instruction, which can aid in locating the error. If Debug Mode is entered due to an abort command, DS:DBA will be set, as for debug requests.
7.2.4.8
69
32002C-AVR32-11/07
AVR32
7.2.5 Exceptions and Debug Mode Debug Mode has priority over any execution mode, so that breakpoints can be set in exception and interrupt routines. However, if a breakpoint is set on an instruction which triggers a critical exception, the breakpoint is flushed. Critical exceptions are exception which are asynchronous to the CPU (interrupts), exceptions which invalidate the currently fetched instruction (e.g. instruction address exceptions), and exceptions which indicate that the system has become unstable and should abort the program flow (e.g. bus error). The complete list of exceptions with higher priority than Debug Mode are listed in the exception chapter in the AVR32 Architecture Manual. If a PC breakpoint, a breakpoint instruction, or a trapped 0x0000 opcode is flushed by an exception, Debug Mode will not be entered. If another type of breakpoint has triggered, Debug Mode will be entered on the first instruction in the exception handler. In the rare cases where the first instruction in a critical exception also triggers a critical exception (e.g. if EVBA is set incorrectly, triggering an infinite loop of instruction address exceptions), the debugger must write the DC:ABORT bit to one to halt the CPU and enter Debug Mode to identify the error. 7.2.6 Instruction replacement A convenient way of implementing an unlimited number of instruction breakpoints is letting the debugger replace an instruction by a breakpoint instruction. This mechanism is only available in OCD Mode on devices implemented with writeable program memory or writeable instruction cache. If this instruction executes, Debug Mode will be entered, and the debugger identifies the breakpointed location. When returning, the breakpoint instruction must be replaced by the original instruction. The debugger will write the Instruction Replace (IRP) bit in DC and the appropriate instruction in the Debug Instruction Register and its corresponding PC value in the Debug Program Counter (DPC). When retd is executed, PC and SR are restored, but one more instruction is fetched from the OCD system before returning to fetching from program memory. Note that instruction replacement operates on word boundaries. The debugger must store the whole word containing the replaced opcode before inserting the breakpoint instruction. Also note that DPC should always be written when performing an instruction replacement to ensure the correct instruction is executed. The debugger will then perform the following sequence when exiting OCD Mode. Note that RAR_DBG is accessed through executing CPU instructions through the Debug Instruction register (DINST). The same sequence can be used both for compact and extended instructions, regardless if the extended instruction is unaligned (in which case only the upper halfword of the instruction is replaced). 1. Write RAR_DBG to the Debug Program Counter. 2. Increment RAR_DBG by 2 or 4, so the register points to the start of the next word in the program memory. 3. Write 1 to Instruction Replace (IRP) in DC. 4. Write a retd instruction to DINST. The CPU will exit Debug Mode and stall while waiting for new instructions. 5. Write the stored word to DINST. This instruction is fetched by the CPU, and the CPU continues normal program execution.
70
32002C-AVR32-11/07
AVR32
7.2.6.1 Instruction replacement example Table 7-1 shows an example of a code where the user wants to insert a breakpoint. Table 7-1.
PC value 0x000010 0x000012 0x000014 0x000016 0x00001A
Example of a user code section
Opcode 0x0E9C 0x201C 0xC0AC 0xF8070046 0x2027 Instruction mov r12,r7 sub r12,0x01 rcall label1 adc r6,r12,r7 sub r7,0x02
The tool wants to insert a software breakpoint on the instruction "adc r6,r12,r7" on PC=0x000016. This is an extended instruction, and only the upper halfword needs to be replaced by the breakpoint instruction. 1. The upper halfword is contained within the word located at 0x000014, and the debug tool stores this value (0xC0ACF807). 2. The debugger writes a breakpoint instruction (opcode 0xD673) to location 0x000016 in the CPU's program memory to replace the most significant word of the breakpointed instruction. 3. When the breakpoint instruction executes, the CPU will enter OCD Mode, and DS:DBS and DS:SWB are set, indicating that OCD Mode is entered due to a software breakpoint. 4. The tool performs a normal sequence of operation in OCD Mode. 5. When the tool is ready to return to normal CPU operation, it reads the RAR_DBG value to find the return address. 6. The tool inserts CPU instructions to DINST to increment RAR_DBG by 2, so it is aligned to the next word in the program memory. 7. The tool inserts a "retd" instruction to DINST. The tool will receive a Debug Status message, which indicates that the CPU has exited OCD Mode, and is now waiting for one more instruction from the tool. 8. The tool writes the return address (0x000016) to the Debug Program Counter (DPC). 9. The tool looks up the stored instruction word (based on the return address) and writes this value (0xC0ACF807) to the Debug Instruction Register (DINST). The CPU now resumes normal operation. 7.2.7 Sleep Mode If the CPU is in sleep mode, it will not receive clocks nor respond to an OCD request from the debugger. Thus, if the Debug Request bit in DC is written to one while the CPU is in sleep mode, the CPU will automatically return to active mode. The instruction following the sleep instruction will be tagged with an OCD exception, and the CPU will jump directly to Debug Mode. The normal debug procedure can be followed while executing in Debug Mode. If Debug Mode is entered from sleep mode, the Stop Status (STP) bit in the Development Status register will be set. When returning from Debug Mode, the CPU will by default return to the instruction following the sleep instruction. The debugger can handle this situation in two ways: Ignore the problem, effectively waking the CPU from sleep mode on a debug request.
71
32002C-AVR32-11/07
AVR32
Decrement RAR_DBG in Debug Mode to return to the sleep instruction. This places the CPU back into sleep mode after exiting Debug Mode. 7.2.8 OCD Register Access The OCD registers control the OCD system. Their specification is based on the Nexus Recommended Registers as outlined in the Nexus Standard Specification [IEEE-ISTO 5001TM-2003]. All registers can be accessed through the JTAG interface. OCD features in Debug Mode When the CPU executes in Debug Mode, certain OCD features will be disabled. The following table indicates how the various OCD features will behave in Debug Mode. For more information on the specific features, please see the indicated page. Table 7-2.
Feature Program Breakpoints (HW) Software Breakpoints Data Breakpoints Watchpoints (program and data) Program Trace Data Trace Ownership Trace Debug Communication Mechanism
7.2.9
OCD features in Debug Mode
Available in Debug Mode? Yes, in Monitor Mode when SR:DM is cleared Yes, in Monitor Mode when SR:DM is cleared Yes, in Monitor Mode when SR:DM is cleared Yes, in Monitor Mode No No Yes Yes
7.2.10
OCD Registers Accessed by CPU A monitor program running on the target can access the OCD registers through mtdr and mfdr instructions. These instructions transfer data between a register in the register file and an OCD register, according to the register index given in "OCD Register Summary" on page 112. These instructions can also be used in OCD mode to transfer information from the register file and system registers to the debugger, through the Debug Communication Mechanism. Runtime write access to OCD registers The OCD registers can always be accessed by JTAG when the when the OCD system is not enabled or the CPU is in OCD Mode. The OCD registers can also be read by JTAG at any time, and by the CPU in any privileged mode. When the CPU is in other modes - either running normal code, or executing in Monitor Mode the OCD registers can be written by JTAG as specified in Table 7-3. If the registers are accessed in another way than specified, undefined operation may result. The OCD Register Protect (ORP) bit in DC define the allowed write access to OCD registers in privileged modes. If the ORP bit in DC does not allow CPU access to OCD registers in the cur-
7.2.11
72
32002C-AVR32-11/07
AVR32
rently executing mode, only PID and DCCPU can be written. Illegal access to the registers will be ignored with no error reporting. Table 7-3. OCD Register access
Can be written by JTAG while CPU is running? Yes Yes Can be written to disable / enable trace channels. Can only be written while trace channel is disabled Can only be written while trace channel is disabled Can be written to disable / enable watchpoints / breakpoints. Can be written to disable / enable watchpoints / breakpoints. Can only be written while breakpoint / watchpoint is disabled Can only be written while breakpoint / watchpoint is disabled Can only be written while breakpoint / watchpoint is disabled Yes No No Yes Yes Can be written by CPU in Monitor Mode? Yes Yes Yes Yes Yes
Register Development Control (DC) Watchpoint Trigger (WT) Data Trace Control (DTC) Data Trace Start Address (DTSA) Channel 1 to 2 Data Trace End Address (DTEA) Channel 1 to 2 PC Breakpoint/Watchpoint Control (BWC)
Yes, if SR:DM is set.
Data Breakpoint/Watchpoint Control (BWC)
Yes, if SR:DM is set. Yes, if SR:DM is set or breakpoint disabled. Yes, if SR:DM is set or breakpoint disabled. Yes, if SR:DM is set or breakpoint disabled. Yes No No Yes Yes
PC Breakpoint/Watchpoint Address (BWA)
Data Breakpoint/Watchpoint Address (BWA)
Breakpoint/Watchpoint Data (BWD)
Ownership Trace Process ID (PID) Debug Instruction Register Debug Program Counter Debug Communication CPU (DCCPU) Debug Communication Emulator (DCEMU)
7.2.12 7.2.12.1
Messages Debug Status (DEBS) This message is output when the CPU enters or exits Debug Mode or a low-power mode. The message is output whenever the AUX port is enabled. The STATUS field of this message contains the information in the Development Status register. The field will contain these values: * The CPU enters Debug Mode: STATUS bits indicate cause of entry to Debug Mode. DBS is set if OCD Mode was entered.
73
32002C-AVR32-11/07
AVR32
* The CPU exits Debug Mode: STATUS = 0. This includes exiting Debug Mode by writing DC:RES. * The CPU enters a low-power mode: Only the STP bit is set, while the other bits are zero. * The CPU exits a low-power mode: STATUS = 0 Table 7-4. Debug Status
Debug Status Message Packet Size 32 6 Packet Name STATUS TCODE Packet Type Fixed Fixed Description The contents of the Development Status register. Value = 0
7.2.13 7.2.13.1
Registers Device ID Register (DID) The Device ID Register (DID) provides key attributes to the development tool concerning the embedded processor. This is the same as the value returned by the JTAG ID instruction. Table 7-5.
R/W R R R R
DID Register
Field Name RN PN MID Reserved Init. Val. Part specific Part specific 0x01F 1 Description RN - Revision Number PN - Product Number Manufacturer ID 0x01F = ATMEL Reserved This bit always reads as 1
Bit Number 31:28 27:12 11:1 0
7.2.13.2
Nexus Configuration Register (NXCFG) The Nexus Configuration Register (NXCFG) provides key information about the specific implementation of the CPU and OCD architecture, and the configuration of the Nexus development features on this device. This information is static, and may be used to develop generic Nexus debuggers which will work across a family of AVR32 devices with different Nexus configurations. Table 7-6.
R/W R R
Nexus Configuration Register
Field Name Reserved NXDMA Init. Val. 0 0 Direct Memory Access support 0 = Not supported 1 = Supported Data Trace Channels 0 = Not supported 1 = Supported Description
Bit Number 31:29 28
R
27:25
NXDTC
0
74
32002C-AVR32-11/07
AVR32
Table 7-6.
R/W R
Nexus Configuration Register
Field Name NXDRT Init. Val. 0 Description Data Read Trace Support 0 = Not supported 1 = Supported Data Write Trace Support 0 = Not supported 1 = Supported Ownership Trace support 0 = Not supported 1 = Supported Program Trace support 0 = Not supported 1 = Supported AUX MDO pins 0 = no MDO or MSEO pins n = n MDO pins, NXMSEO MSEO pins AUX MSEO pins 0 = 1 MSEO pin 1 = 2 MSEO pins Number of Data breakpoints Number of PC breakpoints OCD Version 0000 = AVR32AP OCD 0001 = AVR32UC OCD Other = Reserved Architecture 0000 = AVR32B 0001 = AVR32A Other = reserved
Bit Number 24
R
23
NXDWT
0
R
22
NXOT
0
R
21
NXPT
0
R
20:17
NXMDO
6
R R R
16 15:12 11:8
NXMSEO NXDB NXPCB
1 2 6
R
7:4
NXOCD
0
R
3:0
NXARCH
0
7.2.13.3
Debug Communication CPU Register (DCCPU) If the CPU wants to transmit data to the debugger tool, it writes data to the Debug Communication CPU Register using mtdr. By writing this register, a dirty bit is set in the Debug Communication Status Register. The emulator should poll the status register and read DCCPU if the dirty bit is set. Table 7-7.
R/W R/W
Debug Communication CPU Register
Field Name DATA Init. Val. 0x0000_ 0000 Description Data Value Data written by CPU
Bit Number 31:0
75
32002C-AVR32-11/07
AVR32
7.2.13.4 Debug Communication Emulator Register (DCEMU) When the emulator writes to this register, a dirty bit is set in the Debug Communication Status register. The CPU can poll this bit to see if DCEMU contains unread data.. Table 7-8.
R/W R/W
Debug Communication Emulator Register
Field Name DATA Init. Val. 0x0000_ 0000 Description Data Value Data written by Emulator
Bit Number 31:0
7.2.13.5
Debug Communication Status Register (DCSR) To avoid overruns the CPU must poll this register before writing a new value to DCCPU. Note that the bits in this register are not automatically cleared in OCD mode. This allows a debugger to update views and observe the system without accidentally modifying the DCSR register. Table 7-9.
R/W R
Debug Communication Status Register
Field Name Reserved Init. Val. 0x0000_ 0000 Description Reserved These bits are reserved, and will always read as 0 Emulator Data Dirty 0 = DCEMU has not been written to since last read from CPU. 1 = DCEMU contains a new data value. This bit is cleared by reading DCEMU. CPU Data Dirty 0 = DCCPU has not been written to since last read from emulator. 1 = DCCPU contains a new data value. This bit is cleared by reading DCCPU.
Bit Number 31:2
R/W
1
EMUD
0
R/W
0
CPUD
0
76
32002C-AVR32-11/07
AVR32
7.2.13.6 Development Control Register (DC) DC is used for basic development control of the CPU. Table 7-10.
R/W
Development Control Register
Field Name Init. Val. Description ABORT Writing ABORT to one while DBE is asserted causes the CPU to enter Debug Mode, regardless of SR:DM and any pending exceptions. If the CPU was in sleep mode, it will first be woken up before entering Debug Mode. The ABORT bit is cleared automatically when Debug Mode is entered. RES - Application Reset Writing this bit causes an application reset, which will reset the CPU and other system modules. The OCD state machines will be reset and the Transmit Queue flushed, but the OCD control and configuration registers will not be cleared. MM - Monitor Mode 1 = The CPU will enter Debug Mode in Monitor Mode 0 = The CPU will enter Debug Mode in OCD Mode Changing this bit in Debug Mode does not take effect until the CPU enters Debug Mode the next time. ORP - OCD Register Protect 0 = OCD registers can be written by any privileged CPU mode 1= OCD registers can be written only in Debug Mode RID - Run In Debug 0: Peripherals are frozen in Debug Mode 1: Peripherals keep running in Debug Mode
Bit Number
R/W
31
ABORT
0
S
30
RES
0
R/W
29
MM
0
R/W
28
ORP
0
R/W R
27 26
RID Reserved
0 0
R/W
25
TOZ
0
TOZ - Trap Opcode Zero 0: The opcode 0x0000 is executed as a normal CPU instruction 1: The opcode 0x0000 causes entry to Debug Mode IFM - Ignore First Match When written to one, a PC breakpoint on the first instruction after exiting Debug Mode with the retd instruction will not trigger re-entry to Debug Mode. Typically used when returning from a program breakpoint. This bit stays one until written to zero.
R/W
24
IFM
0
77
32002C-AVR32-11/07
AVR32
Table 7-10.
R/W
Development Control Register
Field Name Init. Val. Description IRP - Instruction Replace If IRP is written to one before exiting OCD Mode with the retd instruction, the first instruction after exiting OCD Mode will be fetched from the Debug Instruction Register. This bit is cleared automatically after this fetch takes place. This bit will not have any effect if written at the same time as RES. SQA - Software Quality Assurance 0: Regular program trace 1: SQA enhanced program trace EOS - Event Out Select 00 = No operation 01 = Emit event out when the CPU enters Debug Mode 10 = Emit event out for breakpoints/watchpoints 11 = Emit event out for message insertion into the TXQ
Bit Number
R/W
23
IRP
0
R/W
22
SQA
0
R/W
21:20
EOS
0
R
19:14
Reserved DBE - Debug Enable DBE enables Debug Mode and all debug features in the CPU. DBE must be written to one to enable breakpoints, debug requests, or single steps. DBR - Debug Request Writing DBR to one while DBE is asserted causes the CPU to enter Debug Mode. If the CPU was in sleep mode, it will first be woken up before entering Debug Mode. The DBR bit is cleared automatically when Debug Mode is entered.
R/W
13
DBE
0
R/W
12
DBR
0
11:9
Reserved SS - Single Step If SS is written to one before exiting Debug Mode with the retd instruction, exactly one instruction will be executed before returning to Debug Mode. SS stays one until written to zero by the debugger.
R/W
8
SS
0
78
32002C-AVR32-11/07
AVR32
Table 7-10.
R/W
Development Control Register
Field Name Init. Val. Description OVC[2:0] - Overrun Control OVC controls the action taken if Branch, Data, or Ownership trace messages are generated while the Transmit Queue is full. Settings 111 though 100 are reserved. 000 = Generate overrun messages 001 = Delay CPU to avoid BTM and Ownership Trace overruns 010 = Delay CPU to avoid DTM and Ownership Trace overruns 011 = Delay CPU to avoid BTM, DTM, and Ownership Trace overruns 111-100 = Reserved EIC[1:0] - EVTI Control The EIC bits control the action performed when the EVTI pin on the Nexus debug port receives a high-to-low transition. If trace is enabled, EVTI can be configured to cause a trace synchronization message. If Debug Mode is enabled, EVTI can be configured to cause a breakpoint. 00 = EVTI for program and data trace synchronization 01 = EVTI for breakpoint generation 10 = No operation 11 = Reserved TM[2:0] - Trace Mode The TM bits select which trace modes are enabled. 000 = No Trace XX1 = OTM Enabled X1X = DTM Enabled 1XX = BTM Enabled If Data or Branch tracing is triggered or stopped by a watchpoint , the DTM and BTM bits are updated accordingly.
Bit Number
R/W
7:5
OVC
0
R/W
4:3
EIC
0
R/W
2:0
TM
0
7.2.13.7
Development Status (DS) register This register is used to examine the debug state of the CPU and the cause for entering Debug Mode. Note that multiple sources may trigger Debug Mode simultaneously, causing more than one bit to be set. The register is read-only. All bits are dynamic and do not require clearing.
79
32002C-AVR32-11/07
AVR32
This register is undefined when the CPU is not in Debug Mode. Table 7-11.
R/W R
Development Status register
Field Name Reserved Init. Val. 0 NTBF -NanoTrace Buffer Full This bit is set if Debug Mode is entered because the Memory Service Unit has signalled that the NanoTrace Buffer is full. This bit is cleared when Debug Mode is exited. EXB -External Breakpoint This bit is set if Debug Mode was entered due to an event on the EVTI pin. This bit is cleared when Debug Mode is exited. DBA - Debug Acknowledge This bit is set if Debug Mode was entered due to setting the Debug Request or ABORT bit in the DC register. This bit is cleared when Debug Mode is exited. BOZ - Break on Opcode Zero This bit is set if Debug Mode was entered due to opcode 0x0000 being executed. This bit is cleared when Debug Mode is exited. INC - Instruction Complete 0: The CPU is executing one or more instructions, or is not in OCD Mode. 1: The CPU is in OCD Mode and is not executing any instructions. Description
Bit Number 31:29
R
28
NTBF
0
R
27
EXB
0
R
26
DBA
0
R
25
BOZ
0
R
24
INC
0
R
23:16
Reserved
0 BP - Breakpoint Status The BP bits identify which hardware breakpoint caused Debug Mode to be entered: BP[0]: BP0A BP[1]: BP0B BP[2]: BP1A BP[3]: BP1B BP[4]: BP2A BP[5]: BP2B BP[6]: BP3A BP[7]: BP3B These bits are cleared when Debug Mode is exited.
R
15:8
BP[7:0]
0
R
7:6
Reserved
0 DBS - Debug Status DBS is set when the CPU is in OCD Mode, otherwise cleared. This bit stays cleared also when the CPU operates in Monitor Mode.
R
5
DBS
0
80
32002C-AVR32-11/07
AVR32
Table 7-11.
R/W
Development Status register
Field Name Init. Val. Description STP - Stop Status STP is set if OCD Mode is entered from sleep mode. This bit can be used by the debugger to determine the proper return sequence from OCD Mode. This bit is cleared when OCD Mode is exited.
Bit Number
R
4
STP
0
R
3
Reserved
0 HWB - Hardware Breakpoint Status This bit is set if Debug Mode was entered due to a hardware breakpoint. The BP[7:0] bits should be examined to determine the breakpoint(s) which triggered. This bit is cleared when Debug Mode is exited. SWB - Software Breakpoint Status This bit is set if Debug Mode was entered due to a breakpoint instruction being executed. Returning from a software breakpoint may require special handling by the debugger. This bit is cleared when Debug Mode is exited. SSS - Single Step Status This bit is set when Debug Mode is entered due to a single step. This bit is cleared when Debug Mode is exited.
R
2
HWB
0
R
1
SWB
0
R
0
SSS
0
7.2.13.8
Debug Instruction Register (DINST) The Debug Instruction Register contains the instruction to be executed in OCD Mode. The CPU fetches and executes the instruction faster than they can be written by the Debug port. DINST is also used to store the instruction to replace the breakpoint instruction. Table 7-12.
R/W R/W
Debug Instruction register
Field Name DINST Init. Val. 0 Description DINST - Debug Instruction The instruction to be executed on the CPU.
Bit Number 31:0
7.2.13.9
Debug Program Counter (DPC) This register contains the PC value of the last executed instruction in any non-debug mode. This allows a debugger to sample program execution addresses for statistical purposes without interrupting the CPU. If this register is read in Debug Mode, it will reflect the last executed instruction before Debug Mode was entered. Note that several types of breakpoints trigger before an instruction is executed, so this value is not necessarily identical to RAR_DBG.
81
32002C-AVR32-11/07
AVR32
When replacing the return instruction from Debug Mode, the CPU will see the DPC value as the PC value for the executed instruction. The user only needs to write this register when replacing the return instruction from OCD Mode. Table 7-13.
R/W R/W
Debug Program Counter
Field Name DPC Init. Val. 0 Description DPC - Debug Program Counter PC of the last executed instruction
Bit Number 31:0
7.3
7.3.1
Debug Port
Overview The OCD debug port consists of the JTAG port and the AUX port. The low bandwidth JTAG port handles all register access, while the high bandwidth AUX port transfers all Nexus messages from the OCD system. The Nexus standard defines the maximum clock frequency for JTAG to be 33 MHz, and for AUX 200 MHz.
7.3.2
JTAG Access to OCD register is done through an IEEE1149.1 JTAG-port. The JTAG TAP controller is shared with the rest of the system. In order to enable access to OCD register the emulator must perform the following sequence. 1. Put the TAP controller in the state "test logic reset". 2. Insert the OCD Instruction to prepare the Debug Port to receive OCD register access. The OCD instruction is inserted using the IR scan path. 3. Use the DR scan path to insert the OCD register address and operation (Read / Write). 4. Use the DR scan path to read / write the data to / from the register. 5. Repeat 3 through 4 for every register operation. The TAP controller will remain in OCD mode until a test logic reset is detected. To be able to use JTAG-based debug tools for AVR32 without adapters, it is recommended that a circuit design using an AVR32 device should use a standard 10-pin 50-mil IDC connector with the pinout shown in Table 7-14. The signals are described in Table 7-15. Table 7-14.
Signal TCK TDO TMS N/C TDI In
AVR32 standard JTAG connector pinout. All directions relative to processor
Dir In Out In Pin 1 3 5 7 9 Pin 2 4 6 8 10 Out In Dir Signal GND VREF RESET_N N/C N/C
82
32002C-AVR32-11/07
AVR32
Table 7-15.
Pin TRST_N TCK TMS TDI TDO RESET_N VREF
JTAG signals
Direction Input Input Input Input Output Input Output Description Asynchronous reset for the TAP controller and JTAG registers Test Clock. Data is driven on falling edge, sampled on rising edge. Test Mode Select Test Data In Test Data Out Device reset Reference voltage from target. Signals should be driven relative to this voltage level.
Figure 7-4.
JTAG TAP controller state diagram.
1 Test-LogicReset 0 0 Run-Test/ Idle 1 Select-DR Scan 0 Capture-DR 0 Shift-DR 1 Exit1-DR 0 Pause-DR 1 0 Exit2-DR 1 1 Update-DR 0 1 0 1 0 0 1 Select-IR Scan 0 Capture-IR 0 Shift-IR 1 Exit1-IR 0 Pause-IR 1 Exit2-IR 1 Update-IR 0 0 1 0 1
1
1
7.3.3
AUX port The Auxiliary (AUX) port and messaging protocol follow the definitions of the Nexus standard. This standard allows varying the number of signalling pins. The following configuration is selected for AVR32UC.
83
32002C-AVR32-11/07
AVR32
* 6 data output pins (MDO) * 2 message start/end output pins (MSEO) * 1 EVTO pin * 1 EVTI pin The configuration is based on the presumed needs for bandwidth in a system being traced at 100+ MIPS, balanced against the desire to keep debug pincount low. This configuration can be changed in future implementations to allow for greater or smaller bandwidth over the AUX port. The AUX pins may be multiplexed with GPIO in a device. By default, the MCKO, MDO, and MSEO pins are tristated or used as GPIO, and the Nexus functionality must be explicitly enabled by the debugger. EVTO, EVTI, and the JTAG pins are always available to the debugger. If the AUX pins are needed for Nexus functionality in an application, it is recommended not to use these pins for GPIO purposes, as this can affect the signal integrity required for Nexus operation. The complete signal list of the AUX port is shown in Table 7-16. Table 7-16.
Auxiliary pins MCKO
Auxiliary pins
Width 1 Direct ion O Description Message Clockout (MCKO) is a free-running output clock to development tools for timing of MDO and MSEO pin functions. Message Data Out (MDO[5:0]) are output pins used for all messages generated by the device. In single datarate mode, external latching of MDO shall occur on rising edge of MCKO. In double datarate mode, external latching of MDO shall occur on both edges of MCKO. Message Start/End Out (MSEO[1:0]) pins indicate when a message on the MDO pins has started, when a variable length packet has ended, and when the message has ended. In single datarate mode, external latching of MSEO shall occur on rising edge of MCKO. In double datarate mode, external latching of MSEO shall occur on both edges of MCKO. Event Out (EVTO) is an output pin which can be configured to toggle every time a message is inserted into the Transmit Queue, when the CPU entered OCD Mode, or when a breakpoint or watchpoint hit occured, as configured by the EOS bits in the Development Control register . Event In (EVTI) is an input which, when a high-to-low transition occurs, a processor is halted (breakpoint) or program and data synchronization messages are transmitted from the OCD controller, as configured by the EIC bits in the Development Control register. System reset
MDO
6
O
MSEO
2
O
EVTO
1
O
EVTI
1
I
RESET_ N
1
I
84
32002C-AVR32-11/07
AVR32
To be able to use AUX-based debug tools for AVR32, a circuit design using an AVR32 device should use a Mictor38 connector (AMP P/N 767054-1) as defined in the Nexus standard, with the pinout shown in Table 7-17. Table 7-17.
Signal MSEO0 MSEO1 MCKO EVTO_N MDO0 MDO1 MDO2 MDO3 MDO4 MDO5
AVR32 standard Nexus connector pinout. All directions relative to processor
Dir Out Out Out Out Out Out Out Out Out Out N/C N/C N/C Pin 38 36 34 32 30 28 26 24 22 20 18 16 14 12 10 8 6 4 2 Pin 37 35 33 31 29 27 25 23 21 19 17 15 13 11 9 7 5 3 1 Out In In In In In Dir Signal N/C N/C N/C N/C N/C N/C N/C N/C TRST_N TDI TMS TCK N/C TDO RESET_N N/C N/C N/C N/C
VREF EVTI_N
Out In N/C N/C N/C N/C
7.3.3.1
Reset configuration The Nexus standard specifies that the AUX port can be enabled by keeping EVTI low while pulsing TRST (or exiting Test-Logic-Reset). The OCD system in AVR32 has removed this feature. In order to enable the AUX port, the debugger has to write the AXC:AXE (Auxiliary Enable) bit. Message protocol The OCD System implements the Auxiliary Port Message Protocol defined in the Nexus standard. The following section is merely a summary of this protocol. For details, please see the Nexus standard. Messages are composed of a Start-of-Message (SOM) token, followed by one or more packets of information, each of fixed or variable length, and ended by an End-of-Message (EOM) token. SOM/EOM and End-of-Variable-Length-Packets (EVLP) are signalled by MSEO for transmitted messages. Packet information is carried by the MDO pins. The number of MDO pins available is known as the port boundary. The information carried by the MDO and MSEO pins each cycle is known as a frame.
7.3.3.2
85
32002C-AVR32-11/07
AVR32
7.3.3.3 Message rules MDO is valid whenever MSEO does not indicate "idle". Fixed length packets are implicitly recognized from the message format, and are not required to end on a port boundary. Thus, packets may also start within a port boundary if following a fixed length packet. The end of variable length packets is identified through the MSEO pins, and to identify the end of the packet uniquely, these packets must end on a port boundary. If necessary, the packet must be stuffed with zeroes to align the end to a port boundary. Variable length packets may be truncated by omitting leading zeroes so that the packet ends on the first possible port boundary. * The MSEO pins behave the following way ("x" means "don't care"): * 0b11 followed by 0b00 indicates SOM * 0b0x followed by 0b11 indicates EOM * 0b00 followed by 0b01 indicates EVLP * MSEO is 0b00 at all other clocks during transmission of a message * MSEO is 0b11 at all clocks when idle. 7.3.3.4 Clock and frame rate In single datarate mode (default), MDO and MSEO should be sampled by an external tool on the rising edge of MCKO. In double datarate mode, the MCKO clock runs at half frequency, so MDO and MSEO should be sampled on both edges of MCKO. This is configured by the Double Datarate bit in the AUX Port Control Register. It is also possible to reduce the frequency of the AUX port compared to the CPU clock by writing the AXC:LS and AXC:DIV bits. If LS=1, the DIV value selects the frame rate of the AUX port: fAUX = fCPU/(DIV+1) If LS=1 and DIV=0, fAUX = fCPU/2. This can be combined with the single or dual datarate mode, as described above. In either case, the sampling edge will be as close to the middle of the MDO data frame as possible. The duty cycle of the MCKO clock will stay within the 40-60 duty cycle requirement of the Nexus standard for all settings apart from DIV=2. 7.3.3.5 Example Figure 7-5 shows an example of transmission of a Program Trace Indirect Branch message. The TCODE is fixed at 6 bits (=4 for PTIB), followed by a fixed-length packet (EVT-ID = 2), and a variable-length packet (I-CNT = 63). I-CNT is stuffed with zeroes to fit the port boundary. Finally, the variable packet U-ADDR (=5) is transmitted. Since this leading zeroes of this packet can be truncated, it fits within a single frame.
86
32002C-AVR32-11/07
AVR32
Figure 7-5. Example of a Nexus message transmission with single and double datarate.
IDLE SOM NORMAL EVLP EOM
MCKO (DDR=1) MCKO (DDR=0) M S E O [1 . . 0 ] M D O [ 5. . 0 ] 11 00 000100 111110 01 000011
I-CNT = 63 TCODE = 4 EVT-ID = 2 Zero stuffing U-ADDR = 5
11 000101
7.3.3.6
Transmit queue and overruns Messages from various sources are inserted in a Transmit Queue (TXQ), which stores a number of frames. This queue acts as a FIFO which allows messages to be inserted more rapidly than they can be retrieved by the emulator. The queue holds 16 frames. If more messages are inserted than there is room for in the queue, information will be lost, and an overrun situation occurs. The TXQ will block any more messages from being inserted, and allow the queue to be emptied by the emulator before allowing any more messages to be inserted. The first message to be inserted after the overrun is cleared, is an Error message, which informs the emulator that an overrun has occurred and which types of trace messages have been lost. After this, transmission continues as normal. Alternatively, the user can configure the OCD to halt the CPU to prevent overruns. This can be done selectively for different message types, and is controlled by writing to the Overrun Control (OVC) bits in the DC register. If any of the OVC bits are set, watchpoint trace messages will usually not generate TXQ overflow. However, triggering an program and data watchpoint on the same instruction may in some rare cases cause an overrun independently of the OVC settings, since a large amount of trace message data will be produced for this instruction.
7.3.3.7
Trace and reset All pending trace messages in the Transmit Queue are flushed if: the OCD is reset by a system reset; the OCD is disabled; or an application reset is triggered by writing to the DC:RES bit. Thus, if the CPU is reset, but not the OCD, the program flow can be observed by program trace. However, if the debugger resets the system, the remaining messages in the queue are of no value, and expected to be flushed. Note that if the OCD is disabled (by clearing DC:DBE or by a system reset), trace is suspended until DC:DBE is written to one. The DC:TM bits must be written simultaneously, and define which trace features should now be active. Similarly, when an application reset is triggered by writing DC:RES, the DC:TM bits are written simultaneously and define which trace features should now be active.
87
32002C-AVR32-11/07
AVR32
7.3.4 7.3.4.1 Messages Error The error message indicates various errors that can occur during trace or debugging. Table 7-19 lists the various errors that can be reported, along with the associated ECODE. If trace messages are lost because of insufficient space in the Transmit Queue, an error message is transmitted, followed by a synchronization message, as soon as space is available in the Transmit Queue. Table 7-18. Error
Direction: From target Description Error code. Refer to Table 7-19. Value = 8
Indirect Branch Message with Sync Packet Size (bits) 5 6 Packet Name ECODE TCODE Packet Type Fixed Fixed
Table 7-19.
ECODE 0b00000 0b00001 0b00010 0b00011 0b00101 0b00110 0b00111 0b01000 0b01001 0b11111
Error codes
Description Ownership trace overrun Program trace overrun Data trace overrun Reserved Watchpoint overrun. Program and/or data and/or ownership trace overrun. Program trace and/or data and/or ownership trace and/or watchpoint overrun. Reserved
7.3.5 7.3.5.1
Registers Auxiliary Port Control Register (AXC) Table 7-20 shows the description of the Auxiliary Port Control Register. This register allows greater flexibility in controlling the operation of the AUX port than specified by the Nexus stan-
88
32002C-AVR32-11/07
AVR32
dard. This includes enabling the AUX port, and controlling the speed of the clock and data compared to the CPU clock. Table 7-20.
R/W R
AUX Port Control Register
Field Name Reserved Init. Val. 0 Description Reserved These bits are reserved, and will always read as 0 AXS - Auxiliary Port Select 0: AUX port is mapped to pin configuration 0. 1: AUX port is mapped to pin configuration 1. 2: AUX port is mapped to pin configuration 2. 3: AUX port is mapped to pin configuration 3. Reserved This bit is reserved, and will always read as 0 Reserved This bit is reserved, and will always read as 0. LS - Low Speed 0:AUX port runs at the same speed as the CPU 1:AUX port runs at reduced speed compared to the CPU. DDR - Double Data Rate Setting this bit halves the MCKO rate so that MDO data must be sampled on both edges of MCKO. 1 = Double data rate mode 0 = Single datarate mode AXO - Auxiliary Port Override 0: AUX port is mapped to the pins dictated by AXS. 1: AUX port is overridden and mapped to pin configuration 1. AXE - Auxiliary Port Enable 0: AUX port is used for GPIO 1: AUX port is used for Nexus operation. This bit does not need to be written in devices with dedicated AUX pins Reserved These bits are reserved, and will always read as 0 DIV - Division factor If LS=1, the DIV value selects the frame rate of the AUX port.
Bit Number 31:16
R/W
15:14
AXS
0
R R
13 12
Reserved Reserved
0 0
R/W
11
LS
0
R/W
10
DDR
0
R/W
9
AXO
0
R/W
8
AXE
0
R
7:4
Reserved
0
R/W
3:0
DIV
0
7.4
7.4.1
Breakpoints
Overview The Nexus Recommended Register map supports up to 8 universal breakpoints. However since the AVR32UC hardware employs separate instruction and data memories, the OCD system must also separate program and data breakpoints. Any breakpoint can also be programmed as
89
32002C-AVR32-11/07
AVR32
a watchpoint. The watchpoint will trigger a Watchpoint Hit message. The OCD system supports up to six program breakpoints modules and two data breakpoint modules. In addition to this, the data trace modules can also be used as data address watchpoints. The trace watchpoints result in a vendor defined Trace Watchpoint Hit message. Figure 7-6. Breakpoint modules.
CPU
PC
Data A ddress Data V alue
Data A ddress
Program BP/WP
Data BP/WP
Trace BP/WP
90
32002C-AVR32-11/07
AVR32
Figure 7-7. Breakpoint unit overview.
PC Breakpoint Unit PC Breakpoint Module 0A
PC Breakpoint Module 0B
PC Breakpoint Module 1A
PC Breakpoint Module 1B
PC Breakpoint Module 2A
PC Breakpoint Module 2B
6 PC Breakpoints Data Breakpoint Unit Data Breakpoint Module 3A 6 PC Watchpoints
5 PC Watchpoints 2 Data Watchpoints
Data Breakpoint Module 3B
Trigger Unit
Start/ Stop Program Trace Unit
Start/ Stop Data Trace Unit 2 Range Data Watchpoints Watchpoint Message Generator Messages to Transmit Queue
2 Data Watchpoints
2 Data Breakpoints
7.4.2
Breakpoint Unit description The Breakpoint unit consists of the units shown in Figure 7-7. The PC Breakpoint Unit (PBU) handles the program counter breakpoints. The PBU can have up to 6 PC breakpoint modules that can match on a single PC. Two modules can be combined to give a match on a range of PC values, thus up to three ranges can be defined. The PBU is configured with registers Breakpoint / Watchpoint Control (BWC) and Breakpoint / Watchpoint Address (BWA) 0A, 0B, 1A, 1B, 2A, and 2B. The Data Breakpoint Unit handles data breakpoints. The data breakpoints can be configured with the BWC / BWA / BWD 3A and 3B registers.
91
32002C-AVR32-11/07
AVR32
The Watchpoint Message Generator (WMG) generates watchpoint messages for all breakpoint modules and data trace watchpoints. Optionally, a breakpoint or watchpoint can be signalled by a pulse on the EVTO pin. This requires DC:EOS bits to be set to 1 and EOC in the corresponding Breakpoint/Watchpoint Control Register must be written to one. 7.4.2.1 Program Breakpoints In order to enable a simple program breakpoint the Breakpoint / Watchpoint Address (BWA) and Breakpoint / Watchpoint Control (BWC) registers for that breakpoint must be updated. The BWA register must be written with the address of the instruction where the debugger wants to halt. The BWC must have the Breakpoint / Watchpoint Enable (BWE) field set to breakpoint. Program breakpoints break on the instruction pointed to by BWA. The instruction will cause a debug exception and the Debug Mode Link Register (RAR_DBG) and Debug Mode Return Status Register (RSR_DBG) will point to the instruction that caused the debug exception. The Development Status register will also be updated to indicate which breakpoint caused the exception. In OCD Mode the debug tool can then feed the CPU with debug code to ascertain the state of the processor. In OCD Mode the breakpoint modules are disabled. Upon return from Debug Mode, the PC and SR will be restored from the RAR_DBG and RSR_DBG and the instruction that caused the debug exception will be fetched again. If the program breakpoint has not been disabled in Debug Mode, the Ignore First Match (IFM) bit in the Development Control (DC) register must be written to one to avoid triggering another breakpoint on the first instruction after exiting Debug Mode. The IFM bit prevents any Program Breakpoint operation on the first instruction after exiting Debug Mode. 7.4.2.2 Watchpoints When enabled in the BWC, a watchpoint message is sent when the instruction address matches the address stored in BWA. If both a Trace watchpoint and a Watchpoint triggers at the same time, the Trace watchpoint will be ignored and only a Watchpoint Hit message will be generated. Note that Program, Data, and Trace watchpoints are generated at different pipeline stages and will not be synchronized when the messages are generated. A Program Watchpoint on a load store instruction will hit before a data watchpoint on the same instruction. 7.4.2.3 Data Breakpoints Data Breakpoint modules listen on the data address and data value lines between the CPU and the data cache and can halt the CPU, or send a watchpoint message, if the address and / or value meets a stored compare value. Unlike program breakpoints, data breakpoints halt on the next instruction after the load / store instruction that caused the breakpoint has completed. The BWA register must be written with the address of the data the debugger wants to halt on. 7.4.3 7.4.3.1 Data Breakpoint interface Data alignment The AVR32 can read or write data in bytes, halfwords, or words. The same data location can be accessed through either operation, e.g. a byte location can be accessed as part of a double word. The data bus operations seen by the OCD system are always aligned, i.e. halfwords start on halfword boundaries, word accesses start on word boundaries, as illustrated in Figure 7-8. If 92
32002C-AVR32-11/07
AVR32
the data bus operation is a double word load / store, the breakpoint module will see the word data value which corresponds to the address in BWA. One data breakpoint module can only compare 32 bits of data. The data to be matched can therefore not cross a word boundary if the data breakpoint is to match correctly. When the debugger wants to match on a byte or halfword, the BWD register must be written with the LSB aligned, and the BWC:BME bits must be set to mask the upper bits of the BWD register. For example, if the debugger wants to match against Byte 1 in Figure 7-8, the BWA must be set to the byte address of Byte 1 and the BWD written with the value to match on aligned to LSB. Also the BWC:BME must be set to mask the 24 most significant bits of the BWD register (BME = 0xE). By default, the data breakpoint module will match on the data value regardless of the size of the access. The data BWC can also be set to match on a specific access size if the SIZE bits are set. The debugger can for example, set the breakpoint module to match only on byte writes to byte 1 in Figure 7-8. The BWD register must still be aligned correctly, and the byte mask must be set, but the data breakpoint will only trigger if a single byte is written to byte 1 and not if, for example, a whole word is written to byte 0, 1, 2, and 3. Figure 7-8. Memory access data alignment.
Double w ord byte 0 to 3 Double w ord byte 4 to 7 Word Half w ord 0 Byte 0 3 Byte 1 2 Half w ord 1 Byte 2 1 Byte 3 0 0x8010 0x800C 0x8008 0x8004 0x8000 Word Address
Byte A ddress
7.4.4
Triggering trace A watchpoint from the program or data breakpoint modules can be used to start or stop program or data trace. This is done using a trigger unit. The trigger unit can be configured using the watchpoint trigger register. When the trigger unit is set to start trace upon a watchpoint, DC:TM will be set accordingly, and trace will then be enabled. If a data watchpoint enables data trace, the data event is not included in the data trace output, while an event which disables data trace is included in the data trace output.
93
32002C-AVR32-11/07
AVR32
7.4.5 7.4.5.1 Messages Watchpoint Hit (WH)
Table 7-21.
Watchpoint Hit
Direction: From target Packet Type Description XXXXXXX1 = Watchpoint 0 matched XXXXXX1X = Watchpoint 1 matched ... X1XXXXXX = Watchpoint 6 matched 1XXXXXXX = Watchpoint 7 matched Value = 15
Watchpoint Message Packet Size Packet Name
8
WPHIT
Fixed
6
TCODE
Fixed
7.4.5.2
Trace Watchpoint Hit (TWH)
Table 7-22.
Trace Watchpoint Hit
Direction: From target Description X1 = Watchpoint 0 matched 1X = Watchpoint 1 matched Value = 56
Trace Watchpoint Message Packet Size 2 6 Packet Name WPHIT TCODE Packet Type Fixed Fixed
7.4.6 7.4.6.1
Registers PC Breakpoint/Watchpoint Address registers (BWA0A, BWA0, BWA1A, BWA1B, BWA2A, BWA2B) The 6 BWA registers contains one instruction address each. The address can be used for a single breakpoint match or used as bitwise mask to create a range. Table 7-23.
R/W R/W
PC BWAnx Register
Field Name BWA Init. Val. 0 Description Breakpoint/Watchpoint Address
Bit Number 31:0
94
32002C-AVR32-11/07
AVR32
7.4.6.2 PC Breakpoint/Watchpoint Control registers - (BWC0A, BWC0B, BWC1A, BWC1B, BWC2A, BWC2B)
Table 7-24.
R/W
PC BWCnx Register
Field Name Init. Val. Description BWE - Breakpoint / Watchpoint Enable 00 = Disabled 01 = Breakpoint enabled 10 = Reserved 11 = Watchpoint enabled Reserved AME - Address Mask Enable This bit is only present in BWCxA registers. 0 = Disabled. 1 = Enabled. BWAxB will be used to bitwise mask the PC compare according to this function: BP A: (PC & BWA_B) == (BWA_A & BWA_B) BP B: Will never trigger Reserved EOC - EVTO Control 0 = Breakpoint/watchpoint status indication is not output on EVTO 1 = Breakpoint/watchpoint status indication is output on EVTO Reserved
Bit Number
RW
31:30
BWE
00
R
29:26
Reserved
0
RW
25
AME
0
R
24:15
Reserved
0
RW
14
EOC
0
R
13:0
Reserved
0
7.4.6.3
Data Breakpoint / Watchpoint Address (BWA3A, BWA3B)
Table 7-25.
R/W RW
Data Breakpoint/Watchpoint address (BWA3x) register
Field Name BWA Init. Val. 0x00000000 Description Address of data for breakpoint or watchpoint generation.
Bit Number 31:0
7.4.6.4
Data Breakpoint / Watchpoint Data (BWD3A, BWD3B)
Table 7-26.
R/W RW
Data Breakpoint/Watchpoint data (BWD3x) register
Field Name BWD Init. Val. 0x00000000 Description Data value for breakpoint or watchpoint generation.
Bit Number 31:0
95
32002C-AVR32-11/07
AVR32
7.4.6.5 Data Breakpoint / Watchpoint Control (BWC3A, BWC3B)
Table 7-27.
R/W
Data Breakpoint / Watchpoint Control (BWC3x)
Field Name Init. Val. Description BWE - Breakpoint / Watchpoint Enable 00 = Disabled 01 = Breakpoint enabled 10 = Reserved 11 = Watchpoint enabled BRW - Breakpoint/Watchpoint Read/Write Select 00 = Break on read access 01 = Break on write access 10 = Break on any access 11 = Reserved Reserved BME - Breakpoint/Watchpoint Data Mask 1XXX = Mask bits 31:24 in BWD X1XX = Mask bits 23:16 in BWD XX1X = Mask bits 15:8 in BWD XXX1 = Mask bits 7:0 in BWD Reserved BWO - Breakpoint/Watchpoint Operand 1X = Compare with BWA value X1 = Compare with BWD value Reserved EOC - EVTO Control 0 = Breakpoint/watchpoint status indication not output on EVTO 1 = Breakpoint/watchpoint status indication is output on EVTO Reserved SIZE - Size bits to match 0xx = Disregard access size (Default) 100 = Byte access 101 = Halfword access 110 = Word access 111 = Reserved Reserved
Bit Number
RW
31:30
BWE
00
RW
29:28
BRW
00
R
27:24
Reserved
00
RW
23:20
BME
0x0
R RW R
19:18 17:16 15
Reserved BWO Reserved
00 000 0
RW
14
EOC
0
R
13:12
Reserved
0
R/W
11:9
SIZE
000
R/W
8:0
Reserved
0
96
32002C-AVR32-11/07
AVR32
7.4.6.6 Watchpoint Trigger
Table 7-28.
R/W
WT, Watchpoint Trigger Register
Field Name Init. Val. Description PTS - Program Trace Start 000 = Trigger disabled 001 = Program watchpoint 0b 010 = Program watchpoint 1a 011 = Program watchpoint 1b 100 = Program watchpoint 2a 101 = Program watchpoint 2b 110 = Data watchpoint 3a 111 = Data watchpoint 3b PTE - Program Trace End 000 = Trigger disabled 001 <-> 111 Watchpoint selected as for PTS DTS - Data Trace Start 000 = Trigger disabled 001 <-> 111 Watchpoint selected as for PTS DTE - Data Trace End 000 = Trigger disabled 001 <-> 111 Watchpoint selected as for PTS Reserved
Bit Number
R/W
31:29
PTS
000
R/W
28:26
PTE
000
R/W
25:23
DTS
000
R/W R
22:20 19:0
DTE Reserved
000 -
7.5
7.5.1
Program trace
Program trace overview The AVR32 OCD system provides program trace support via the debug port. The program trace feature implements a Program Flow Change Model in which the program trace is synchronized at each program flow discontinuity. This occurs at taken indirect branches and exceptions. A record of taken / not taken direct branches is included so that the complete program flow can be decoded. The development tool can then interpolate what transpires between each program trace message by correlating information from branch target messaging and static source or object code files. Self-modifying code cannot be traced with the Program Flow Change Model because the source code is not static. The TM[2] bit in the Development Control register must be set to enable program trace.
7.5.1.1
Branch message summary Five types of branch messages can be generated: 1. Program Trace, Indirect Branch is transmitted on most subroutine calls, returns, interrupts, exceptions, and any situation where the target address of a branch cannot be determined from the source code. This message contains the instruction count to identify the branch and the target PC to identify the branch target. 2. Program Trace Synchronization is transmitted to indicate the current PC after starting trace or after trace synchronization is lost. 97
32002C-AVR32-11/07
AVR32
3. Program Trace, Indirect Branch messages with sync contain both instruction count and PC, and are transmitted instead of a Program Trace Synchronization message if a synchronization condition occurs and the current instruction is a taken direct/indirect branch. 4. Program Trace, Resource full messages is transmitted when an internal buffer overflows. ICNT is transmitted whenever it overflows with this message. 5. Program Trace Correlation. This message is transmitted to synchonize the program trace with an event. Sent when trace is disabled, debug mode is entered or sleep mode is entered. The Nexus standard also specifies Program Trace Correction messages to correct for speculatively transmitted trace messages, but these are not implemented in the AVR32, since program trace messages are only transmitted for actually executed instructions. Similarly, the Nexusspecified CANCEL packet of synchronized branch messages is not implemented in AVR32. Entry into Debug Mode will generate an program trace correlation message, while no trace messages are generated while executing in Debug Mode. A Program Trace Synchronization message is transmitted when Debug Mode is exited. 7.5.2 Branch message packets The program trace messages contain packets which identify the address of the taken branch, the target of the branch, and the current program counter value. These packets are discussed below. Instruction count packet In several of the program trace messages, an Instruction Count (I-CNT) packet is included, to identify the number of sequentially executed instruction units since the last program trace message. In AVR32, this figure refers to bytes, i.e. compact instructions count two bytes and extended instructions are four bytes. The following rules apply to instruction counts: * A taken indirect branch which generates a trace message is not included in the instruction count. * An indirect branch which is not taken is included in the instruction count. * Speculatively fetched instructions are not counted until they are actually executed. * The instruction counter is reset every time a program trace message is generated. 7.5.2.2 Compressed program counter packets To save bandwidth, the Nexus messages employ compressed versions of the program counter address. These include: U-ADDR = StripLeadingZeros (Previous sent addr xor uncompressed address from pipeline). F-ADDR = Full target address for a taken branch. Leading zeroes may be truncated. 7.5.3 7.5.3.1 Special cases Debug Mode When entering Debug Mode, a PTC message is generated with EVCODE = 0. When exiting Debug Mode, a PTSY message is generated. If the instruction also generates a branch message, the branch message with sync (i.e. PTDBS or PTIBS) is generated instead of
7.5.2.1
98
32002C-AVR32-11/07
AVR32
PTSY. In this case, the address of the instruction which generated the branch message can not be explicitly reconstructed from the trace log, but the debugger will normally know which address was returned to when Debug Mode was exited. If a breakpoint occurs on the first instruction after exiting Debug Mode, a PTC message with EVCODE = 0 is generated. 7.5.4 7.5.4.1 Messages Program Trace, Direct Branch This message is output by the target processor whenever there is a change of program flow caused by a conditional or unconditional branch. The instruction count (I-CNT) is included to identify the branch address. The following AVR32 instructions can cause a direct branch: Table 7-29.
Mnemonic br{cond3} br{cond4} rjmp Compact Branch if condition satisfied. Extended Compact Branch if condition satisfied.
Direct branch instructions
Description
Table 7-30.
Direct Branch message without sync
Direction: From target Packet Type Variable Fixed Description Number of bytes executed since the last taken branch. Value = 3
Direct Branch Message Packet Size (bits) 8 6 Packet Name I-CNT TCODE
7.5.4.2
Program Trace, Direct Branch with Target Address This message is transmitted instead of the Direct Branch message when SQA enhanced program trace is enabled by writing DC:SQA to one. This simplifies real-time PC reconstruction in the emulator for real-time code coverage and performance analysis purposes. Table 7-31. Direct Branch message with Target Address
Direction: From target Description The unique portion of the branch target address for a taken indirect branch or exception. Most significant bits that have a value of 0 are truncated. Number of bytes executed since the last taken branch. Value = 57
Direct Branch Message with Sync Packet Size (bits) 32 8 6 Packet Name U-ADDR I-CNT TCODE Packet Type Variable Variable Fixed
7.5.4.3
Program Trace, Indirect Branch An indirect branch is output by the target processor whenever there is a change of program flow caused by a subroutine call, return instruction, interrupt, or exception.
99
32002C-AVR32-11/07
AVR32
Messages for taken indirect branches and exceptions include how many sequential bytes were executed since the last taken branch or exception, and the unique portion of the branch target address or exception vector address. The unique portion of the branch is found by doing an exclusively or on the branch target and the last sent UADDR / FADDR. Additionally, the cause of the indirect branch is identified through an Event ID packet. Operations causing indirect branches and their corresponding EVT-ID are shown below. Table 7-32.
Description Exception entry Subroutine call Branch via register contents Return
Operations causing indirect branch messages
Operation Exception, interrupts (0 to 3), NMI, entry to Debug Mode acall, icall, mcall, jcall, scall, rcall instruction Any mov (except mov pc, lr) or load (except popm/ldm) with PC as destination. Any arithmetic instruction with PC as destination. ret{cond4}, rete, rets, retj, (mov pc, lr), popm/ldm loading PC EVT-ID 3 2 1
0
Note that subrotine returns are often accomplished by a mov pc, lr, popm or ldm instruction with PC included in the argument list. This generates an EVT-ID of 0 instead of 1.. Table 7-33. Indirect branch message without sync
Direction: From target Packet Type Variable Variable Description The unique portion of the branch target address for a taken indirect branch or exception. Most significant bits that have a value of 0 are truncated. Number of bytes executed since the last taken branch. Cause of indirect branch: 3: Exception entry 2: Call 1: Branch via register contents 0: Return Value = 4
Indirect Branch Message Packet Size (bits) 32 8 Packet Name U-ADDR I-CNT
2
EVT-ID
Fixed
6
TCODE
Fixed
7.5.4.4
Program Trace Synchronization This message is output by the PTU when any of the following conditions occurs: 1. Upon exit from reset. This is required to allow the number of instruction units executed packet in a subsequent Program Trace Message to be correctly interpreted by the tool. 2. When program trace is enabled during normal execution of the embedded processor. 3. Upon exit from a power-down state. This is required to allow the number of instruction units executed packet in a subsequent Program Trace Message to be correctly interpreted by the tool. 4. Upon exiting from Debug Mode. 5. An overrun condition had previously occurred in which one or more branch trace occurrences were discarded by the target processor's debug logic.To inform the tool that an overrun condition occurred, the target outputs an Error Message (TCODE = 8) with an 100
32002C-AVR32-11/07
AVR32
ECODE value of 00001 or 00111 immediately prior to the Program Trace Synchronization Message. 6. A debug control register field specifies that EVTI pin action is to generate program trace synchronization, and the Event-In (EVTI) pin has been asserted. 7. Upon overflow of the sequential instruction unit counter. 8. After 256 branch messages without sync.
Table 7-34.
Program Trace Synchronization Message
Direction: From target Description The full current instruction address. Most significant bits that have a value of 0 are truncated. Number of bytes executed since the last taken branch. Value = 9
Program Trace Sync Message Packet Size (bits) 32 8 6 Packet Name PC I-CNT TCODE Packet Type Variable Variable Fixed
7.5.4.5
Program Trace, Direct Branch with Sync If a Program Trace Synchronization message occurs on an instruction which transmits a direct branch message, the Direct Branch with Sync message is transmitted instead of the Program Trace Synchronization message. The Direct Branch with Sync message contains the instruction count referring to the taken branch, as well as the complete PC value of the branch target. The format for direct branch messages with sync is shown below. The AVR32 OCD system never issues speculative branch messages and there is therefore no CANCEL packet. Table 7-35. Direct Branch message with Sync
Direction: From target Description The full target address for a taken direct branch. Most significant bits that have a value of 0 are truncated. Number of bytes executed since the last taken branch. Value = 11
Direct Branch Message with Sync Packet Size (bits) 32 8 6 Packet Name F-ADDR I-CNT TCODE Packet Type Variable Variable Fixed
7.5.4.6
Program Trace, Indirect Branch with Sync If a Program Trace Synchronization message occurs on an instruction which transmits an indirect branch message, the Indirect Branch with Sync message is transmitted instead of the Program Trace Synchronization message. The Indirect Branch with Sync message contains the instruction count referring to the taken branch, as well as the complete PC value of the branch target.
101
32002C-AVR32-11/07
AVR32
The format for indirect branch messages with sync is shown below. The AVR32 OCD system never issues speculative branch messages and there is therefore no CANCEL packet. Table 7-36. Indirect Branch message with Sync
Direction: From target Description The full target address for a taken direct branch. Most significant bits that have a value of 0 may be truncated. Number of bytes executed since the last taken branch. Cause of indirect branch: 3: Exception entry 2: Call 1: Branch via register contents 0: Return Value = 12
Indirect Branch Message with Sync Packet Size (bits) 32 8 Packet Name F-ADDR I-CNT Packet Type Variable Variable
2
EVT-ID
Fixed
6
TCODE
Fixed
7.5.4.7
Program Trace, Resource Full This message is output whenever an internal resource (sequential instruction counter) has reached its maximum value. To avoid losing information when this resource becomes full, the Resource Full message is transmitted. The information from this message is added with information from subsequent messages to interpret the full picture of what has transpired. Multiple Resource Full messages can occur before the arrival of the message that the information belongs with. Table 7-37. Resource Full message
Direction: From target Description Number of bytes executed since the last taken branch. Resource Code. This code indicates which internal resource has reached its maximum value. Refer to Table 7-38 for details. Value = 27
Program Trace, Resource Full Packet Size (bits) 8 4 6 Packet Name RDATA RCODE TCODE Packet Type Variable Fixed Fixed
Table 7-38.
Resourc e Code 0b0000 0b0001 0b1111
Resource Code (RCODE) description
Resource Program Trace - Sequential Instruction Counter Reserved Data Packet Value Number of instruction units executed since the last taken branch.
102
32002C-AVR32-11/07
AVR32
7.5.4.8 Program Trace Correlation Program Trace Correlation messages are used to correlate events to the program flow that may not be associated with the instruction stream (e.g. Data Trace Messages). The occurrence of an event listed in Table 7-39 will cause this message to be transmitted. Table 7-39. Program Trace Correlation message
Direction: From target Packet Type Variable Fixed Fixed Description Number of instruction units executed since the last taken branch. Event Code. Refer to Table 7-40. Value = 33
Program Trace Correlation Packet Size (bits) 8 4 6 Packet Name I-CNT EVCODE TCODE
Table 7-40.
Event Code (EVCODE) 0b0000 0b0001
Event Code (EVCODE) description
Event Description Entry into Debug Mode Entry into Low Power Mode Reserved Program Trace Disabled Reserved
0b0010 - 0b0011 0b0100 0b0101 - 0b1111
7.5.5
Registers Program trace is enabled using the TM field in the Development Control register.
7.6
7.6.1
Data Trace
Overview The AVR32 OCD system provides data trace via the AUX port. The CPU data memory accesses can be monitored real-time using the Nexus class 2+ compliant Data Trace Unit. Both reads and writes can be traced. Data Trace information is transmitted through data trace messages, which can be of read or write type, with or without sync. The messages contain information about the data address and value which triggered the trace. Data addresses can be complete (with sync), or compressed relative to the previous transmitted message (without sync). The value contains the data value read or written from the data cache, and is of the same width as the access size (byte, halfword, word, or doubleword). The TM[1] bit in the Development Control register must be set to enable data trace. It is also possible to trigger data trace using watchpoints. In this case, TM[1] will be set or cleared automatically.
103
32002C-AVR32-11/07
AVR32
7.6.2 Using data trace channels as watchpoints Data Trace is enabled for address ranges (trace channels) specified by pairs of Data Trace Start and End Address registers (DTSA/DTEA). Each data access within that boundary will generate an action as specified by the corresponding bits in the Data Trace Control register (DTC). The AVR32 OCD system currently supports two data trace channels. While each channel can be used to trigger data trace messages, it is also possible to trigger watchpoint messages, providing flexibility when using the OCD system. Watchpoints can be ranged, i.e. trigger on all accesses between DTSA through DTEA, or trigger on a single location, if DTSA and DTEA are written to the same value. Writing TnWP to one enables a watchpoint on accesses for data trace channel n. The watchpoint message is sent as a vendor defined trace watchpoint message. It is possible to enable both trace and watchpoint on the same channel, but typically, only one of the options will be used. 7.6.3 Messages The Trace Watchpoint Hit message is described in Section 7.4.5.2 on page 94. 7.6.3.1 Data Trace, Data Write (DTDW) This message is output by the target processor when it detects a memory write that matches the OCD system's data trace attributes.
Table 7-41.
Data Trace, Data Write message
Direction: From target Description The data value written. The size will vary depending on the load / store instruction being traced. The unique portion of the data write address, which is relative to the previous Data Trace Message (read or write). Data size: 00 = 8 bits 01 = 16 bits 10 = 32 bits Value=5
Data Trace, Data Write message Packet Size 8 / 16 / 32 32 Packet Name DATA U-ADDR Packet Type Variable Variable
2
DSZ
Fixed
6
TCODE
Fixed
7.6.3.2
Data Trace, Data Write with Sync (DTDWS) This message is an alternative to the Data Trace, Data Write Message. It is output instead of a Data Trace, Data Write Message whenever a memory write occurs that matches the debug logic's data trace attributes, and when one of the following conditions has occurred: 1. The processor has exited from reset. This synchronization message is required to allow the unique portion of the data write address of following Data Trace, Data Write Messages to be correctly interpreted by the tool. 2. When data trace is enabled during normal execution of the embedded processor.
104
32002C-AVR32-11/07
AVR32
3. Upon exit from a power-down state. This synchronization message is required to allow the unique portion of the data write address of following Data Trace, Data Write Messages to be correctly interpreted by the tool. 4. The Event-In pin has been asserted and a debug control register field specifies that EVTI pin action is to generate data trace synchronization. 5. An overrun condition had previously occurred in which one or more data trace occurrences were discarded by the target processor's debug logic. To inform the tool that an overrun condition occurred,the target outputs an Error Message (TCODE = 8) with an ECODE value of 00010 or 00111 immediately prior to the Data Trace, Data Write with Sync Message. 6. The Data Trace Message counter has expired indicating that at most 256 without-sync versions of Data Trace Messages have been sent since the last with-sync version. 7. A data write is detected following the processor exiting from Debug Mode.
Table 7-42.
Data Trace, Data Write with Sync message
Direction: From target Description The data value written. The size will vary depending on the load / store instruction being traced. The full address of the memory location written. Most significant bits that have a value of 0 are truncated. Data size: 00 = 8 bits 01 = 16 bits 10 = 32 bits Value=13
Data Trace, Data Write with Sync message Packet Size 8 / 16 / 32 32 Packet Name DATA F-ADDR Packet Type Variable Variable
2
DSZ
Fixed
6
TCODE
Fixed
7.6.3.3
Data Trace, Data Read (DTDR) This message is output by the target processor when it detects a memory read that matches the OCD system's data trace attributes. Table 7-43. Data Trace, Data Read message
Direction: From target Description The data value read. The size will vary depending on the load / store instruction being traced. The unique portion of the data read address, which is relative to the previous Data Trace Message (read or write). Data size: 00 = 8 bits 01 = 16 bits 10 = 32 bits Value=6
Data Trace, Data Read message Packet Size 8 / 16 / 32 32 Packet Name DATA U-ADDR Packet Type Variable Variable
2
DSZ
Fixed
6
TCODE
Fixed
105
32002C-AVR32-11/07
AVR32
7.6.3.4 Data Trace, Data Read with Sync (DTDRS) This message is an alternative to the Data Trace, Data Read Message. It is output instead of a Data Trace, Data Read Message whenever a memory read occurs that matches the debug logic's data trace attributes, and when one of the following conditions has occurred: The processor has exited from reset. This synchronization message is required to allow the unique portion of the data write address of following Data Trace, Data Read Messages to be correctly interpreted by the tool. When enabling data trace is during normal execution of the embedded processor. Upon exit from a power-down state. This synchronization message is required to allow the unique portion of the data write address of following Data Trace, Data Read Messages to be correctly interpreted by the tool. The Event-In pin has been asserted and a debug control register field specifies that EVTI pin action is to generate data trace synchronization. An overrun condition had previously occurred in which one or more data trace occurrences were discarded by the target processor's debug logic. To inform the tool that an overrun condition occurred, the target outputs an Error Message (TCODE = 8) with an ECODE value of 00010 or 00111 immediately prior to the Data Trace, Data Read with Sync Message. The periodic Data Trace Message counter has expired indicating that 255 without-sync versions of Data Trace Messages have been sent since the last with-sync version. A data read is detected following the processor exiting from Debug Mode. Table 7-44. Data Trace, Data Read with Sync message
Direction: From target Description The data value read. The size will vary depending on the load / store instruction being traced. The full address of the memory location written. Most significant bits that have a value of 0 are truncated. Data size: 00 = 8 bits 01 = 16 bits 10 = 32 bits Value=14
Data Trace, Data Read with Sync message Packet Size 8 / 16 / 32 32 Packet Name DATA F-ADDR Packet Type Variable Variable
2
DSZ
Fixed
6
TCODE
Fixed
106
32002C-AVR32-11/07
AVR32
7.6.3.5 Data Trace, Read-Modify-Write (DTRMW) This message is generated when a Read-Modify-Write (RMW) instruction is generated with a target address within an active data trace window. These instructions have the format "memc/s/t imm, bp", and can clear, set, or toggle a specified bit in memory. Table 7-45. Data Trace, Read-Modify-Write message
Direction: From target Description The unique portion of the data write address, which is relative to the previous Data Trace Message (read or write). Bit argument of the RMW instruction. Bit operation: 00 = Reserved 01 = Clear 10 = Set 11 = Toggle Value=58
Data Trace, Read-Modify-Write message Packet Size 32 5 Packet Name U-ADDR BIT Packet Type Variable Variable
2
TYPE
Fixed
6
TCODE
Fixed
7.6.3.6
Data Trace, Read-Modify-Write with Sync (DTRMWS) This message is output instead of DTRMW under the same conditions as shown for DTDWS. Table 7-46. Data Trace, Read-Modify-Write with Sync message
Direction: From target Description The full address of the memory location written. Most significant bits that have a value of 0 are truncated. Bit argument of the RMW instruction. Bit operation: 00 = Reserved 01 = Clear 10 = Set 11 = Toggle Value=59
Data Trace, Read-Modify-Write with sync message Packet Size 32 5 Packet Name F-ADDR BIT Packet Type Variable Variable
2
TYPE
Fixed
6
TCODE
Fixed
107
32002C-AVR32-11/07
AVR32
7.6.4 7.6.4.1 Registers Data Trace Control register (DTC) This register controls actions taken on data accesses within all data trace channels. Table 7-47.
R/W
Data Trace Control Register
Bit Number Field Name Init. Val. Description RWT0 - Read/Write Trace channel 0 00 = No trace enabled x1 = Enable data read trace 1x = Enable data write trace RWT1 - Read/Write Trace channel 1 00 = No trace enabled x1 = Enable data read trace 1x = Enable data write trace
R/W
31:30
RWT0
0
R/W
29:28
RWT1
0
R R/W R/W
27:2 1 0
Reserved T1WP T0WP
0 0 0 T1WP - Trace Channel 1 Watchpoint T0WP - Trace Channel 0 Watchpoint
7.6.4.2
Data Trace Start/End Address register (DTSA/DTEA) DTSAn and DTEAn define the inclusive data access range [DTSAn : DTEAn] for trace channel n. Each trace channel 0 and 1 has its own DTSA/DTEA register pair. If DTSA=DTEA, the trace channel will match on accesses to a single location. If DTSA>DTEA, no match will occur for the trace channel. DTSA0, DTSA1 Table 7-48.
R/W R/W
Data Trace Start Address Register
Bit Number 31:0 Field Name DTSA Init. Val. 0 Description DTSA - Start address for trace visibility
DTEA0, DTEA1 Table 7-49.
R/W R/W
Data Trace End Address Register
Bit Number 31:0 Field Name DTEA Init. Val. 0 Description DTEA - End address for trace visibility
7.7
7.7.1
Ownership Trace
Functional description The AVR32 OCD system implements Ownership Trace in compliance with the Nexus standard. Ownership trace provides a macroscopic view, such as task flow reconstruction, when debugging software written in a high level (or object oriented) language. It offers the highest level of abstraction for tracking operating system software execution. This is especially useful when the developer is not interested in debugging at lower levels. Ownership trace is especially important for embedded processors with a memory management unit, in which all processes can use the same virtual program and data spaces. Ownership trace 108
32002C-AVR32-11/07
AVR32
offers development tools a mechanism to decipher which set of symbolics and sources are associated for lower levels of visibility and debugging. Ownership trace information is transmitted out the AUX using an Ownership Trace Message. OTM facilitates ownership trace by providing visibility of which process ID or operating system task is activated. An Ownership Trace Message is transmitted to indicate when a new process/task is activated, allowing development tools to trace ownership flow. Additionally, an Ownership Trace Message is also transmitted periodically during runtime at a minimum frequency of every 256 Program Trace or Data Trace Messages. In the AVR32, this feature is supported through an Ownership Trace Register, which automatically produces an Ownership Trace Message when written to. The RTOS scheduler routine writes the new process ID to this register during process switching using the mtdr instruction. The TM[0] bit in the Development Control register must be set to enable ownership trace. 7.7.2 7.7.2.1 Messages Ownership Trace (OT) * The ownership trace message is sent: * When the Ownership Trace Process ID (PID) register is written. * When program trace with sync message is generated due to overflow in the periodic message counter. * When a data trace with sync message is generated due to overflow in the periodic message counter. * After a Transmit Queue overrun if the CPU has written to PID when the queue was full. If there is no room in the Transmit Queue for the message, and the CPU is not halted to prevent overruns, an error message is produced. Table 7-50. Ownership Trace Message
Direction: From target Description Task / process ID. Value = 2
Ownership Trace Message Packet Size 32 6 Packet Name PROCESS TCODE Packet Type Fixed Fixed
7.7.3 7.7.3.1
Registers Ownership Trace Process ID (PID) The CPU should write the current Process ID value to this register, whenever the RTOS performs a process switch. This will automatically create an Ownership Trace Message to be transmitted to the tool. This register can be written from any privileged CPU mode.
109
32002C-AVR32-11/07
AVR32
The tool can read and write this register, although it is recommended that only the CPU writes this register. Table 7-51.
R/W RW
Ownership Trace Process ID (PID)
Field Name PROCESS Init. Val. 0 Description PROCESS - Process ID The unique Process ID number of the currently running process.
Bit Number 31:0
7.8
OCD Message Summary
Table 7-52. Message Summary
Public / Vendor Defined Public
TCODE 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16-26 27 28-32 33 34-55 56 57 58
Message Debug Status (DEBS) Reserved Ownership Trace (OT) Program Trace, Direct Branch (PTDB) Program Trace, Indirect Branch (PTIB) Data Trace, Data Write (DTDW) Data Trace, Data Read (DTDR) Reserved Error (ERROR) Program Trace Synchronization (PTSY) Reserved Program Trace, Direct Branch with Sync (PTDBS) Program Trace, Indirect Branch with Sync (PTIBS) Data Trace, Data Write with Sync (DTDWS) Data Trace, Data Read with Sync (DTDRS) Watchpoint Hit (WH) Reserved Program Trace Resource Full (PTRF) Reserved Program Trace Correlation (PTC) Reserved Trace Watchpoint Hit (TWH) Direct Branch with Target Address (DBTA) Data Trace, Read-Modify-Write (DTRMW)
Page page 73
Public Public Public Public Public
page 109 page 99 page 99 page 104 page 105
Public Public
page 88 page 100
Public Public Public Public Public
page 101 page 101 page 104 page 106 page 94
Public
page 102
Public
page 103
Vendor Vendor Vendor
page 94 page 99 page 107
110
32002C-AVR32-11/07
AVR32
Table 7-52. Message Summary
Public / Vendor Defined Vendor Vendor Vendor
TCODE 59 60-62 63 (0x3F)
Message Data Trace, Read-Modify-Write with Sync (DTRMWS) Reserved Vendor Defined Extension Message Reserved
Page page 107
Table 7-52 shows the messages which can be transmitted by the target on the AUX port. OCD registers can be written by the tool using the JTAG mechanism described in "Debug Port" on page 82. Table 7-53 shows the format of the transmitted messages. Packets shown in bold are variable length, the others are fixed length. All variable length packets can be truncated by omitting leading zeroes, but will always end on a port boundary. Table 7-53. Message formats
Message format Nexus Message Debug Status Ownership Trace Error Program Trace, Direct Branch Program Trace, Direct Branch with Target Address Program Trace, Indirect Branch Program Trace Synchronization Program Trace, Direct Branch with Sync Program Trace, Indirect Branch with Sync Program Trace Resource Full Program Trace Correlation Data Trace, Data Write TCODE [5:0] 0 2 8 3 Packet 1 STATUS[31:0] PROCESS [31:0] ECODE[4:0] I-CNT[7:0] Packet 2 Packet 3
57 I-CNT[7:0] 4 9 EVT-ID[1:0] U-ADDR[31:0] I-CNT[7:0] PC[31:0]
-
U-ADDR[31:0] -
I-CNT[7:0]
11 I-CNT[7:0] 12 EVT-ID[1:0] I-CNT[7:0] 27 33 5 RCODE[3:0] EVCODE[3:0] DSZ[1:0] RDATA[7:0] I-CNT[7:0] U-ADDR[31:0] F-ADDR[31:0]
-
F-ADDR[31:0]
DATA[31:0]
111
32002C-AVR32-11/07
AVR32
Table 7-53. Message formats
Message format Nexus Message Data Trace, Data Read Data Trace, ReadModify-Write Data Trace, Data Write with Sync Data Trace, Data Read with Sync Data Trace, ReadModify-Write with Sync Watchpoint Hit Trace Watchpoint Hit TCODE [5:0] 6 58 13 14 Packet 1 DSZ[1:0] TYPE[1:0] DSZ[1:0] DSZ[1:0] Packet 2 U-ADDR[31:0] BIT[4:0] Packet 3 DATA[31:0] U_ADDR[31:0] DATA[31:0] DATA[31:0]
F-ADDR[31:0] F-ADDR[31:0] BIT[4:0]
59 15 56
TYPE[1:0] WPHIT[7:0] WPHIT[1:0]
F_ADDR[31:0] -
7.9
OCD Register Summary
Use the index shown in the "Register index" column when accessing OCD registers by the Nexus access mechanism (see Section 7.3.2 on page 82).Use the index shown in the "mtdr/mfdr index" column when accessing OCD registers by mtdr/mfdr instructions from the CPU (see Section 7.2.10 on page 72). These indexes are identical to the register index multiplied by 4. Table 7-54.
Register Index 0 1 2 3 4 5-6 7 8 9 10 11 12 13 14-15
OCD Register Summary
mtdr/mf dr index 0 4 8 12 16 20-24 28 32 36 40 44 48 52 56-60 Register Device ID (DID) Reserved Development Control (DC) Reserved Development Status (DS) Reserved Reserved Reserved Reserved Reserved Watchpoint Trigger (WT) Reserved Data Trace Control (DTC) Data Trace Start Address (DTSA) Channel 0 to 1 Access Type R -- R/W -- R -- -- -- -- -- R/W -- R/W R/W Page
page 74
page 77
page 79
page 97
page 108 page 108
112
32002C-AVR32-11/07
AVR32
Table 7-54.
Register Index 16-17 18-19 20-21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40-65 64 65 66 67 68 69 70 71 72-74 75 76 77- 255
OCD Register Summary
mtdr/mf dr index 64-68 72-76 80-84 88 92 96 100 104 108 112 116 120 124 128 132 136 140 144 148 152 156 160-260 256 260 264 268 272 276 280 284 288-296 300 304 3081020 Register Reserved Data Trace End Address (DTEA) Channel 0 to 1 Reserved PC Breakpoint/Watchpoint Control 0A (BWC0A) PC Breakpoint/Watchpoint Control 0B (BWC0B) PC Breakpoint/Watchpoint Control 1A (BWC1A) PC Breakpoint/Watchpoint Control 1B (BWC1B) PC Breakpoint/Watchpoint Control 2A (BWC2A) PC Breakpoint/Watchpoint Control 2B (BWC2B) Data Breakpoint/Watchpoint Control 3A (BWC3A) Data Breakpoint/Watchpoint Control 3B (BWC3B) PC Breakpoint/Watchpoint Address 0A (BWA0A) PC Breakpoint/Watchpoint Address 0B (BWA0B) PC Breakpoint/Watchpoint Address 1A (BWA1A) PC Breakpoint/Watchpoint Address 1B (BWA1B) PC Breakpoint/Watchpoint Address 2A (BWA2A) PC Breakpoint/Watchpoint Address 2B (BWA2B) Data Breakpoint/Watchpoint Address 3A (BWA3A) Data Breakpoint/Watchpoint Address 3B (BWA3B) Breakpoint/Watchpoint Data 3A (BWD3A) Breakpoint/Watchpoint Data 3B (BWD3B) Reserved Nexus Configuration (NXCFG) Debug Instruction Register (DINST) Debug Program Counter (DPC) Reserved Debug Communication CPU Register (DCCPU) Debug Communication Emulator Register (DCEMU) Debug Communication Status Register (DCSR) Ownership Trace Process ID (PID) Reserved Reserved AUX port Control (AXC) Reserved Access Type -- R/W -- R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W R/W -- R R/W R/W -- R/W R/W R/W R/W -- -- R/W -- Page
page 108
page 95 page 95 page 95 page 95 page 95 page 95 page 96 page 96 page 94 page 94 page 94 page 94 page 94 page 94 page 95 page 95 page 95 page 95
page 74 page 81 page 81
page 75 page 76 page 76 page 109
page 88
113
32002C-AVR32-11/07
AVR32
8. Revision History
Doc. Rev. Date Comments
32002C
2007-11-19
MPU compilant with revision 2 of AVR32 Architecture. Added cycle counts for new instruction in version 2 of the CPU. Added COUNT/COMPARE system register reset-on-match. Added CPU Local Bus. Reconfigured OCD AXC register.
32002B 32002A
2007-08-03 2007-03-30
Added Memory Service Unit description. Added description of peripheral behavior in Debug. Initial revision.
114
32002C-AVR32-11/07
AVR32
Table of contents
1 Introduction .............................................................................................. 2
1.1 1.2 1.3 1.4 1.5 1.6 1.7 The AVR family ...............................................................................................2 The AVR32 Microprocessor Architecture .......................................................2 Exceptions and Interrupts ...............................................................................3 Java Support ...................................................................................................3 Microarchitectures ...........................................................................................3 The AVR32UC architecture ............................................................................4 AVR32UC CPU revisions ................................................................................5
2
Programming Model ................................................................................ 6
2.1 2.2 2.3 2.4 2.5 2.6 2.7 Architectural compatibility ...............................................................................6 Implementation options ...................................................................................6 Register file configuration ...............................................................................6 The Status Register ........................................................................................6 System registers .............................................................................................9 COMPARE and COUNT registers ................................................................14 Configuration Registers ................................................................................14
3
Pipeline ................................................................................................... 17
3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 3.9 3.10 3.11 Overview .......................................................................................................17 Prefetch unit ..................................................................................................17 Decode unit ...................................................................................................17 EX pipeline stage ..........................................................................................18 Support for unaligned addresses ..................................................................19 Forwarding hardware and hazard detection .................................................19 Event handling ..............................................................................................19 Special concerns ...........................................................................................21 Entry points for events ..................................................................................23 Interrupt latencies .........................................................................................34 NMI latency ...................................................................................................35
4
Memory System ..................................................................................... 36
4.1 4.2 4.3 4.4 Memory sections ...........................................................................................36 Memory interfaces ........................................................................................37 IF stage interface ..........................................................................................37 EX stage interfaces .......................................................................................37
i
32002C-AVR32-11/07
AVR32
4.5 4.6 IRAM Write buffer .........................................................................................39 Memory barriers ............................................................................................39
5
Memory Protection Unit ........................................................................ 41
5.1 5.2 5.3 Memory map in systems with MPU ...............................................................41 Understanding the MPU ................................................................................41 Example of MPU functionality .......................................................................45
6
Instruction Cycle Summary .................................................................. 46
6.1 6.2 6.3 6.4 6.5 6.6 6.7 6.8 6.9 6.10 6.11 6.12 6.13 6.14 6.15 6.16 6.17 6.18 6.19 Definitions .....................................................................................................46 Special considerations ..................................................................................46 CPU revision .................................................................................................46 ALU instructions ............................................................................................46 Multiply instructions .......................................................................................49 MAC instructions ...........................................................................................50 MulMac64 instructions ..................................................................................51 Divide instructions .........................................................................................52 Saturate instructions .....................................................................................52 Load and store instructions ...........................................................................52 Multiple data memory access instructions ....................................................56 Branch instructions .......................................................................................57 Call instructions .............................................................................................57 Return from execution mode instructions .....................................................57 Swap instructions ..........................................................................................58 System register instructions ..........................................................................58 System control instructions ...........................................................................58 Read-modify-write instructions ......................................................................58 Code example ...............................................................................................59
7
OCD system ............................................................................................ 61
7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Overview .......................................................................................................61 CPU Development Support ...........................................................................64 Debug Port ....................................................................................................82 Breakpoints ...................................................................................................89 Program trace ...............................................................................................97 Data Trace ..................................................................................................103 Ownership Trace .........................................................................................108 OCD Message Summary ............................................................................110
ii
32002C-AVR32-11/07
AVR32
7.9 OCD Register Summary .............................................................................112
8
Revision History ................................................................................... 114
iii
32002C-AVR32-11/07
Headquarters
Atmel Corporation 2325 Orchard Parkway San Jose, CA 95131 USA Tel: 1(408) 441-0311 Fax: 1(408) 487-2600
International
Atmel Asia Room 1219 Chinachem Golden Plaza 77 Mody Road Tsimshatsui East Kowloon Hong Kong Tel: (852) 2721-9778 Fax: (852) 2722-1369 Atmel Europe Le Krebs 8, Rue Jean-Pierre Timbaud BP 309 78054 Saint-Quentin-enYvelines Cedex France Tel: (33) 1-30-60-70-00 Fax: (33) 1-30-60-71-11 Atmel Japan 9F, Tonetsu Shinkawa Bldg. 1-24-8 Shinkawa Chuo-ku, Tokyo 104-0033 Japan Tel: (81) 3-3523-3551 Fax: (81) 3-3523-7581
Product Contact
Web Site www.atmel.com Technical Support avr32@atmel.com Sales Contact www.atmel.com/contacts
Literature Requests www.atmel.com/literature
Disclaimer: The information in this document is provided in connection with Atmel products. No license, express or implied, by estoppel or otherwise, to any intellectual property right is granted by this document or in connection with the sale of Atmel products. EXCEPT AS SET FORTH IN ATMEL'S TERMS AND CONDITIONS OF SALE LOCATED ON ATMEL'S WEB SITE, ATMEL ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL ATMEL BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION, OR LOSS OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF ATMEL HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Atmel makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications and product descriptions at any time without notice. Atmel does not make any commitment to update the information contained herein. Unless specifically provided otherwise, Atmel products are not suitable for, and shall not be used in, automotive applications. Atmel's products are not intended, authorized, or warranted for use as components in applications intended to support or sustain life.
(c) 2007 Atmel Corporation. All rights reserved. Atmel (R), logo and combinations thereof, AVR (R) and others are registered trademarks or trademarks of Atmel Corporation or its subsidiaries. Other terms and product names may be trademarks of others.
32002C-AVR32-11/07

▲Up To Search▲

Price & Availability of AVR32UC

	To Download AVR32UC Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .